When you type a prompt into Copilot and watch a response stream back, a surprisingly sophisticated system springs into action beneath the surface. Understanding how Copilot is wired together — from the moment you hit Enter to the last token streamed to your editor — makes you a sharper user and helps you build better integrations on top of it.
Availability: All GitHub Copilot plans including Free.
The three-tier architecture
Copilot's architecture spans three tiers, each with a distinct job:
| Tier | What lives here | Examples |
|---|---|---|
| Client | Surface-specific extensions and CLI tooling that capture your intent and render responses | VS Code extension, JetBrains plugin, gh copilot CLI, GitHub.com UI |
| Copilot Proxy | GitHub's backend layer that authenticates requests, selects the model, assembles context, and routes to the AI provider | api.githubcopilot.com |
| AI Provider | The underlying large language model that generates completions and tool-call decisions | OpenAI (GPT series), Anthropic (Claude), Google (Gemini), xAI (Grok), Alibaba (Qwen) |
The Copilot Proxy is the hidden workhorse. It shields you from provider complexity, handles authentication via your GitHub token, and ensures all model calls conform to GitHub's content policies regardless of which model is selected.
The request flow, step by step
Here is what actually happens when you submit a prompt:
YOU (prompt)
│
▼
CLIENT EXTENSION
│ • Captures editor context (open file, cursor, selection)
│ • Attaches #file / @workspace / #fetch references
│ • Sends authenticated request to Copilot Proxy
▼
COPILOT PROXY
│ • Validates GitHub OAuth token
│ • Applies content safety filters (pre-flight)
│ • Selects model (explicit choice, or Auto)
│ • Assembles the final prompt (system prompt + history + context)
│ • Forwards to AI Provider via provider-specific API
▼
AI PROVIDER (LLM)
│ • Generates tokens one by one
│ • May emit tool-call requests (function calling)
│ • Streams response back through the Proxy
▼
COPILOT PROXY
│ • Applies content safety filters (post-flight)
│ • Streams tokens to the client
▼
CLIENT EXTENSION
• Renders streamed tokens in the editor / terminal / chat panel
Model selection: Auto vs. explicit
Copilot supports a growing roster of models from multiple providers. You can pick one explicitly, or let Copilot choose for you with Auto mode.
| Selection mode | How it works | Best for |
|---|---|---|
| Auto | Copilot picks the best available model based on availability and task type | Most everyday workflows — set it and forget it |
| General-purpose (e.g., GPT-4.1, Claude Sonnet) | Balanced speed and quality | Code completions, explanations, reviews |
| Deep reasoning (e.g., GPT-5.1, Claude Opus, Gemini Pro) | Extended reasoning chains before answering | Architecture decisions, complex debugging, multi-step refactors |
| Agentic (e.g., GPT-5.1 Codex Max, GPT-5.2-Codex) | Optimised for multi-step tool use and autonomous task execution | Coding agent tasks, long-running edits |
| Fast / lightweight (e.g., Claude Haiku, Gemini Flash) | Minimal latency, lower premium cost | Repetitive edits, quick lookups, high-volume usage |
Note that different models carry different premium request multipliers. If you're on a plan with a monthly usage allowance, choosing a deep-reasoning model consumes more of your budget per request than a general-purpose one.
How the prompt is assembled
Before any tokens reach the LLM, the Copilot Proxy constructs a composite prompt from several sources — most of which are invisible to you:
-
System prompt — GitHub's instructions to the model (persona, content
policy, output format expectations). You can extend this with
custom instructions in a
.github/copilot-instructions.mdfile in your repository. - Conversation history — previous turns in the chat session, subject to the context window limit.
-
Explicit context — files, symbols, URLs, or terminal output you attach
via
#file,#fetch,@workspace, or similar references. - Implicit context — the active editor file, cursor position, and surrounding code that the extension automatically includes.
- Tool definitions — the function signatures for any tools the model is allowed to call (repo search, terminal, MCP servers, etc.).
All of this is packed into a single token-counted payload and sent to the AI provider. The context window — the total token capacity of the model — determines how much of this payload can fit.
Tool calls and the ReAct loop
Modern LLMs support function calling: instead of answering immediately, the model can emit a structured tool-call request. The Copilot runtime intercepts this, executes the tool, and feeds the result back into the context window. The model then continues from where it left off. This cycle repeats until the model emits a final response — the ReAct loop (Reason → Act → Observe → Reason…).
LLM decides to call a tool
│
▼
Tool executed by runtime
(e.g. read_file, run_terminal, search_codebase, MCP server call)
│
▼
Result appended to context window
│
▼
LLM resumes generation
│
├── another tool call? → repeat
└── final answer? → stream to client
This is why Copilot can perform multi-step tasks like "find all usages of
getUserById, check for null checks, and write a test" — it issues
multiple tool calls within a single response cycle without any additional prompting
from you.
Streaming and latency
Copilot uses server-sent events to stream tokens from the AI provider through the Proxy to your client the moment they are generated. This is why you see responses appear word-by-word rather than all at once. The trade-off is that deep-reasoning models introduce a longer thinking delay before the first token appears — the model silently reasons before writing. General-purpose models start streaming faster.
Content safety: two checkpoints
GitHub runs content safety filters at two points:
- Pre-flight — the incoming prompt is checked before it reaches the LLM. Requests that violate GitHub's usage policies are rejected before any tokens are consumed.
- Post-flight — the model's response is checked before it is streamed to the client. Outputs that fail the safety check are blocked even if the model produced them.
Both checkpoints are transparent to the end user — you simply see a refusal message if either check fires.
Custom instructions: your hook into the system prompt
The one place where you can directly influence the assembled prompt is the
custom instructions file. Drop a
.github/copilot-instructions.md in your repository and Copilot will
automatically append its contents to the system prompt for every request made in
that repo. This is the recommended way to enforce project conventions, coding styles,
or tool preferences without manually including them in every chat message.