Back to all posts

Copilot CLI System Architecture Deep Dive

The GitHub Copilot CLI is more than a wrapper around an LLM — it is a layered, modular agentic system designed for terminal-based AI assistance. Understanding how its internal components collaborate reveals why the CLI can handle complex, multi-step tasks that simple prompt-response systems cannot.

Availability: All GitHub Copilot paid plans (Pro, Pro+, Business, and Enterprise).

High-level architecture

The Copilot CLI consists of six primary layers, each with distinct responsibilities. Information flows through these layers in both directions — not as a linear pipeline, but as an interactive loop driven by the LLM at the centre.

Layer Responsibility Key components
User Interface Terminal I/O, multi-line input support, persistent event timeline rendering Terminal renderer, input buffer
Input Routing Classifies input by prefix (/, !, @, plain text) Input router, prefix parser
Core Processing Dispatches commands, manages context window, streams responses from LLM Command router, LLM client, context manager, session manager
Tool Execution Executes built-in tools, custom agents, MCP servers with permission gates Tool router, shell executor, permission system, MCP pool
State Management Persists session state, user config, authentication, and timeline events Disk storage (~/.copilot/session-state/, ~/.copilot/config)
External Integrations Communicates with GitHub REST API, LLM backends, and external MCP servers GitHub API client, model provider adapters, MCP protocol handlers

Input routing and prefix classification

Every input to the CLI is analyzed by the Input Router, which determines how to process it based on its prefix:

  • / prefix — Slash commands that control CLI behavior. Examples: /model (switch LLM), /usage (view token stats), /config (adjust settings).
  • ! prefix — Shell commands executed with permission checks. Example: !git status runs git status in the current working directory.
  • @ prefix — File or image attachments. Example: @main.js injects the file content into the context window.
  • Plain text — Natural language prompts sent directly to the LLM.

This prefix-based design keeps the system extensible. New command types can be added without changing the core processing logic.

Core processing layer: The orchestration hub

Once input is classified, the Command Router dispatches it to the appropriate subsystem. This layer coordinates four critical components:

Component Function
LLM Client Manages communication with model providers (Claude, GPT). Handles streaming, retries, and model switching at runtime.
Context Manager Maintains the conversation context window. Tracks token usage, triggers compaction when nearing limits, and injects tool results back into context.
Tool Router Dispatches tool calls requested by the LLM. Supports parallel execution and gates unsafe operations behind permission checks.
Session Manager Persists conversation state to disk, enables resumption, and manages session checkpoints.

The LLM sits at the centre of this layer — it does not passively receive input. It actively decides which tools to call, when to request more context, and how to respond. This is the ReAct (Reason → Act → Observe) loop in action.

Tool execution and permission system

When the LLM requests a tool call, the Tool Router validates it against a permission system before execution. This prevents unsafe operations like destructive shell commands or unauthorized file writes.

Built-in tools include:

  • File system operations (read, write, edit, search)
  • Shell command execution (with explicit approval gates)
  • GitHub API interactions (repos, issues, PRs, commits)
  • Custom agents loaded from ~/.copilot/agents
  • External MCP servers configured in ~/.copilot/config

Tool results are appended to the context window, and the LLM continues from there. This feedback loop allows the model to refine its approach based on real execution outcomes.

State management and session persistence

Unlike stateless chat interfaces, the CLI maintains persistent session state on disk:

  • ~/.copilot/session-state/ — Stores conversation history, checkpoints, and compaction summaries.
  • ~/.copilot/config — User preferences, model selection, MCP server configurations.
  • ~/.copilot/auth — GitHub authentication tokens.

This persistence enables resumable sessions. You can exit the CLI, restart it days later, and continue exactly where you left off — a feature critical for long-running tasks that span multiple work sessions.

Data flow: From input to response

A typical interaction follows this flow:

USER INPUT
  │
  ▼
INPUT ROUTER ────► [Slash | Shell | File | Prompt]
  │
  ▼
COMMAND ROUTER ──► dispatch to subsystem
  │
  ▼
CONTEXT MANAGER ─► inject into context window
  │
  ▼
LLM CLIENT ──────► stream response from model
  │
  ├──► tool call request ──► TOOL ROUTER ──► execute ──► result ──► CONTEXT
  │
  └──► final response ────► TERMINAL UI
  │
  ▼
SESSION MANAGER ─► persist state to disk

The LLM may request multiple tool calls in sequence or parallel. Each result is fed back into the context window, and the loop continues until the LLM produces a final response.

Extensibility: Custom agents and MCP servers

The CLI is designed to be extended without modifying its core. Two primary extension points exist:

  • Custom agents — Place agent scripts in ~/.copilot/agents/. The CLI auto-discovers and exposes them as tools the LLM can invoke.
  • MCP servers — Configure external Model Context Protocol servers in ~/.copilot/config. These servers provide domain-specific tools (e.g., database queries, API integrations) that the LLM can call.

This architecture allows teams to build specialized workflows without forking the CLI codebase.

Compaction and infinite sessions

When the context window nears its token limit (around 95%), the Context Manager triggers automatic compaction. Older conversation turns are summarized by the LLM, and the summary replaces the original verbatim history.

This happens transparently in the background. From the user's perspective, the session simply continues without interruption. Essential information is preserved; redundant details are discarded.

Compaction is why the CLI supports infinite sessions — there is no hard context limit. You can work on complex tasks over hours or days without hitting token ceilings.

Key architectural decisions

  • LLM-driven control flow. The model is not a passive responder — it actively decides which tools to call and when. This is the foundation of agentic behavior.
  • Layered modularity. Each layer has a single responsibility. Input routing, command dispatch, tool execution, and state persistence are cleanly separated, making the system easier to test and extend.
  • Persistent state. Session state lives on disk, not in memory. This enables resumption across process restarts and supports long-running workflows.
  • Permission gates. Tool execution is mediated by a security layer. Unsafe operations require explicit user approval.
  • Feedback loops. Tool results, compaction summaries, and context updates all flow back into the LLM's next turn. This closed-loop design enables iterative refinement.

Why this architecture matters

The Copilot CLI is not just a chat interface with a command-line skin. Its architecture reflects a fundamentally different design philosophy:

  • It treats the LLM as an agent, not a function.
  • It persists state across time, enabling resumable workflows.
  • It integrates tools and external systems through a secure, extensible framework.
  • It manages context intelligently, supporting infinite sessions through automatic compaction.

Understanding this architecture helps you work with the CLI instead of against it. You can design prompts that leverage tool calls, structure tasks that benefit from session persistence, and extend the system with custom agents when built-in capabilities are not enough.

Documentation