It is tempting to think of GitHub Copilot as a linear pipeline — user prompt flows down through a session, context window, memory layer, tools, and finally reaches the LLM at the bottom. That mental model is intuitive, but it is wrong in two important ways. The LLM is not at the end of the chain; it sits at the centre, and both tool calls and context compaction are feedback loops driven by the model itself.
Availability: All GitHub Copilot plans including Free.
The common (incorrect) mental model
Many developers first picture the flow like this — a neat top-to-bottom waterfall:
USER prompt
│
▼
SESSION (maintains conversation)
│
▼
CONTEXT WINDOW (token-limited)
│
▼
MEMORY / SUMMARY (compaction)
│
▼
TOOLS (repo search, terminal, MCP…)
│
▼
LLM MODEL (GPT / Claude / etc.)
This diagram gets the components right but misrepresents the direction of control. In reality the LLM is not a passive consumer sitting at the bottom — it is the agent that drives everything else.
The accurate model: LLM at the centre
USER
│
▼
SESSION ────────────────────────────────┐
│ │ (persists / resumes)
▼ │
CONTEXT WINDOW ◄──── COMPACTION LOOP ◄─┘
│ ▲ (compress history when nearing
│ │ the token limit; CLI & SDK only)
▼ │
LLM MODEL ◄─────────────────────────────┐
│ │
├──► TOOL CALL (repo search, terminal, │
│ MCP servers, tests…) │
│ │ │
│ └──► result ──► CONTEXT ────┘
│ WINDOW
└──► RESPONSE to USER
Layer-by-layer breakdown
| Layer | What it does | Who controls it |
|---|---|---|
| User | Provides the prompt, approves tool calls, steers the conversation | Human |
| Session | Persists conversation state, stores checkpoints and artifacts on disk | Copilot runtime |
| Context Window | The token buffer fed to the LLM — contains the system prompt, history, file content, and tool results | Copilot runtime |
| Compaction loop | When the context nears the token limit (~95% on CLI), conversation history is automatically compressed in the background and re-injected, enabling virtually infinite sessions | Copilot runtime (triggered automatically) |
| LLM Model | Generates responses and decides which tools to call | The model itself |
| Tools | Execute actions (shell, file writes, repo search, MCP servers) and return results back into the context window | LLM-driven, user-approved |
Key insights
- Tools are outputs, not inputs. The LLM decides mid-generation whether to call a tool. The tool result is then appended to the context window, and the LLM resumes from there. This is the ReAct loop (Reason → Act → Observe → Reason…).
- Compaction is a feedback loop, not a layer. Memory / summary is not a preprocessing step before the LLM. It is triggered by the runtime when the context window nears its limit, and the summary is written back into the context window.
- The session wraps everything. The session persists state between tool calls, between compaction cycles, and — in Copilot CLI and SDK — across resumptions of the same long-running task.
- Infinite sessions emerge from the compaction loop. There is no magic unlimited memory — old turns are compressed and replaced. Essential information is preserved; verbatim history beyond the summary is discarded.
Where this differs across surfaces
| Surface | Compaction loop | Persistent session | Tool calls (ReAct) |
|---|---|---|---|
| Copilot CLI | ✅ Automatic | ✅ On disk (~/.copilot/session-state/) |
✅ |
| Copilot SDK | ✅ Automatic | ✅ On disk (via workspacePath when infinite sessions are enabled) |
✅ |
| IDE — local sessions (VS Code, JetBrains…) | ❌ | ❌ | ✅ |
| IDE — background sessions (VS Code) | ✅ Automatic (delegates to CLI) | ✅ On disk (delegates to CLI) | ✅ |
| GitHub.com UI | ❌ | ❌ | ✅ (coding agent supports tools) |
Tip: When you hit context limits in a local IDE session, you are
experiencing the absence of the compaction loop — not a Copilot bug. Use
#file references, break tasks into focused conversations, switch to a
background session (VS Code), or use Copilot CLI / the Copilot SDK for
long-running work.