Inside GHCP: How the Execution Engine Works

It is tempting to think of GitHub Copilot as a linear pipeline — user prompt flows down through a session, context window, memory layer, tools, and finally reaches the LLM at the bottom. That mental model is intuitive, but it is wrong in two important ways. The LLM is not at the end of the chain; it sits at the centre, and both tool calls and context compaction are feedback loops driven by the model itself.

Availability: All GitHub Copilot plans including Free.

The common (incorrect) mental model

Many developers first picture the flow like this — a neat top-to-bottom waterfall:

USER prompt
    │
    ▼
SESSION  (maintains conversation)
    │
    ▼
CONTEXT WINDOW  (token-limited)
    │
    ▼
MEMORY / SUMMARY  (compaction)
    │
    ▼
TOOLS  (repo search, terminal, MCP…)
    │
    ▼
LLM MODEL  (GPT / Claude / etc.)

This diagram gets the components right but misrepresents the direction of control. In reality the LLM is not a passive consumer sitting at the bottom — it is the agent that drives everything else.

The accurate model: LLM at the centre

USER
  │
  ▼
SESSION ────────────────────────────────┐
  │                                     │ (persists / resumes)
  ▼                                     │
CONTEXT WINDOW  ◄──── COMPACTION LOOP ◄─┘
  │   ▲               (compress history when nearing
  │   │                the token limit; CLI & SDK only)
  ▼   │
LLM MODEL  ◄─────────────────────────────┐
  │                                      │
  ├──► TOOL CALL (repo search, terminal, │
  │    MCP servers, tests…)              │
  │         │                            │
  │         └──► result ──► CONTEXT ────┘
  │                         WINDOW
  └──► RESPONSE to USER

Layer-by-layer breakdown

Layer	What it does	Who controls it
User	Provides the prompt, approves tool calls, steers the conversation	Human
Session	Persists conversation state, stores checkpoints and artifacts on disk	Copilot runtime
Context Window	The token buffer fed to the LLM — contains the system prompt, history, file content, and tool results	Copilot runtime
Compaction loop	When the context nears the token limit (~95% on CLI), conversation history is automatically compressed in the background and re-injected, enabling virtually infinite sessions	Copilot runtime (triggered automatically)
LLM Model	Generates responses and decides which tools to call	The model itself
Tools	Execute actions (shell, file writes, repo search, MCP servers) and return results back into the context window	LLM-driven, user-approved

Key insights

Tools are outputs, not inputs. The LLM decides mid-generation whether to call a tool. The tool result is then appended to the context window, and the LLM resumes from there. This is the ReAct loop (Reason → Act → Observe → Reason…).
Compaction is a feedback loop, not a layer. Memory / summary is not a preprocessing step before the LLM. It is triggered by the runtime when the context window nears its limit, and the summary is written back into the context window.
The session wraps everything. The session persists state between tool calls, between compaction cycles, and — in Copilot CLI and SDK — across resumptions of the same long-running task.
Infinite sessions emerge from the compaction loop. There is no magic unlimited memory — old turns are compressed and replaced. Essential information is preserved; verbatim history beyond the summary is discarded.

Where this differs across surfaces

Surface	Compaction loop	Persistent session	Tool calls (ReAct)
Copilot CLI	✅ Automatic	✅ On disk (`~/.copilot/session-state/`)	✅
Copilot SDK	✅ Automatic	✅ On disk (via `workspacePath` when infinite sessions are enabled)	✅
IDE — local sessions (VS Code, JetBrains…)	❌	❌	✅
IDE — background sessions (VS Code)	✅ Automatic (delegates to CLI)	✅ On disk (delegates to CLI)	✅
GitHub.com UI	❌	❌	✅ (coding agent supports tools)