The GitHub Copilot CLI is more than a wrapper around an LLM — it is a layered, modular agentic system designed for terminal-based AI assistance. Understanding how its internal components collaborate reveals why the CLI can handle complex, multi-step tasks that simple prompt-response systems cannot.
Availability: All GitHub Copilot paid plans (Pro, Pro+, Business, and Enterprise).
High-level architecture
The Copilot CLI consists of six primary layers, each with distinct responsibilities. Information flows through these layers in both directions — not as a linear pipeline, but as an interactive loop driven by the LLM at the centre.
| Layer | Responsibility | Key components |
|---|---|---|
| User Interface | Terminal I/O, multi-line input support, persistent event timeline rendering | Terminal renderer, input buffer |
| Input Routing | Classifies input by prefix (/, !, @, plain text) |
Input router, prefix parser |
| Core Processing | Dispatches commands, manages context window, streams responses from LLM | Command router, LLM client, context manager, session manager |
| Tool Execution | Executes built-in tools, custom agents, MCP servers with permission gates | Tool router, shell executor, permission system, MCP pool |
| State Management | Persists session state, user config, authentication, and timeline events | Disk storage (~/.copilot/session-state/, ~/.copilot/config) |
| External Integrations | Communicates with GitHub REST API, LLM backends, and external MCP servers | GitHub API client, model provider adapters, MCP protocol handlers |
Input routing and prefix classification
Every input to the CLI is analyzed by the Input Router, which determines how to process it based on its prefix:
-
/prefix — Slash commands that control CLI behavior. Examples:/model(switch LLM),/usage(view token stats),/config(adjust settings). -
!prefix — Shell commands executed with permission checks. Example:!git statusrunsgit statusin the current working directory. -
@prefix — File or image attachments. Example:@main.jsinjects the file content into the context window. - Plain text — Natural language prompts sent directly to the LLM.
This prefix-based design keeps the system extensible. New command types can be added without changing the core processing logic.
Core processing layer: The orchestration hub
Once input is classified, the Command Router dispatches it to the appropriate subsystem. This layer coordinates four critical components:
| Component | Function |
|---|---|
| LLM Client | Manages communication with model providers (Claude, GPT). Handles streaming, retries, and model switching at runtime. |
| Context Manager | Maintains the conversation context window. Tracks token usage, triggers compaction when nearing limits, and injects tool results back into context. |
| Tool Router | Dispatches tool calls requested by the LLM. Supports parallel execution and gates unsafe operations behind permission checks. |
| Session Manager | Persists conversation state to disk, enables resumption, and manages session checkpoints. |
The LLM sits at the centre of this layer — it does not passively receive input. It actively decides which tools to call, when to request more context, and how to respond. This is the ReAct (Reason → Act → Observe) loop in action.
Tool execution and permission system
When the LLM requests a tool call, the Tool Router validates it against a permission system before execution. This prevents unsafe operations like destructive shell commands or unauthorized file writes.
Built-in tools include:
- File system operations (read, write, edit, search)
- Shell command execution (with explicit approval gates)
- GitHub API interactions (repos, issues, PRs, commits)
- Custom agents loaded from
~/.copilot/agents - External MCP servers configured in
~/.copilot/config
Tool results are appended to the context window, and the LLM continues from there. This feedback loop allows the model to refine its approach based on real execution outcomes.
State management and session persistence
Unlike stateless chat interfaces, the CLI maintains persistent session state on disk:
~/.copilot/session-state/— Stores conversation history, checkpoints, and compaction summaries.~/.copilot/config— User preferences, model selection, MCP server configurations.~/.copilot/auth— GitHub authentication tokens.
This persistence enables resumable sessions. You can exit the CLI, restart it days later, and continue exactly where you left off — a feature critical for long-running tasks that span multiple work sessions.
Data flow: From input to response
A typical interaction follows this flow:
USER INPUT
│
▼
INPUT ROUTER ────► [Slash | Shell | File | Prompt]
│
▼
COMMAND ROUTER ──► dispatch to subsystem
│
▼
CONTEXT MANAGER ─► inject into context window
│
▼
LLM CLIENT ──────► stream response from model
│
├──► tool call request ──► TOOL ROUTER ──► execute ──► result ──► CONTEXT
│
└──► final response ────► TERMINAL UI
│
▼
SESSION MANAGER ─► persist state to disk
The LLM may request multiple tool calls in sequence or parallel. Each result is fed back into the context window, and the loop continues until the LLM produces a final response.
Extensibility: Custom agents and MCP servers
The CLI is designed to be extended without modifying its core. Two primary extension points exist:
-
Custom agents — Place agent scripts in
~/.copilot/agents/. The CLI auto-discovers and exposes them as tools the LLM can invoke. -
MCP servers — Configure external Model Context Protocol servers in
~/.copilot/config. These servers provide domain-specific tools (e.g., database queries, API integrations) that the LLM can call.
This architecture allows teams to build specialized workflows without forking the CLI codebase.
Compaction and infinite sessions
When the context window nears its token limit (around 95%), the Context Manager triggers automatic compaction. Older conversation turns are summarized by the LLM, and the summary replaces the original verbatim history.
This happens transparently in the background. From the user's perspective, the session simply continues without interruption. Essential information is preserved; redundant details are discarded.
Compaction is why the CLI supports infinite sessions — there is no hard context limit. You can work on complex tasks over hours or days without hitting token ceilings.
Key architectural decisions
- LLM-driven control flow. The model is not a passive responder — it actively decides which tools to call and when. This is the foundation of agentic behavior.
- Layered modularity. Each layer has a single responsibility. Input routing, command dispatch, tool execution, and state persistence are cleanly separated, making the system easier to test and extend.
- Persistent state. Session state lives on disk, not in memory. This enables resumption across process restarts and supports long-running workflows.
- Permission gates. Tool execution is mediated by a security layer. Unsafe operations require explicit user approval.
- Feedback loops. Tool results, compaction summaries, and context updates all flow back into the LLM's next turn. This closed-loop design enables iterative refinement.
Why this architecture matters
The Copilot CLI is not just a chat interface with a command-line skin. Its architecture reflects a fundamentally different design philosophy:
- It treats the LLM as an agent, not a function.
- It persists state across time, enabling resumable workflows.
- It integrates tools and external systems through a secure, extensible framework.
- It manages context intelligently, supporting infinite sessions through automatic compaction.
Understanding this architecture helps you work with the CLI instead of against it. You can design prompts that leverage tool calls, structure tasks that benefit from session persistence, and extend the system with custom agents when built-in capabilities are not enough.