Claude Code Architecture
AI Agent from IPO Model to Extension Mechanisms
01. IPO — The Universal Model
Mọi AI agent đều tuân theo mô hình Input → Process → Output, nhưng khác với chương trình truyền thống (1 lần I→P→O), agent chạy vòng lặp ReAct[^1]: output của tool call quay lại thành input cho lần inference tiếp theo. Vòng lặp kết thúc khi response không chứa tool call nào.
---
config:
theme: base
themeVariables:
primaryColor: "#d4edda"
primaryTextColor: "#1a1a1a"
primaryBorderColor: "#28a745"
lineColor: "#666"
secondaryColor: "#e8daef"
tertiaryColor: "#d1ecf1"
---
flowchart LR
subgraph I["📥 INPUT"]
direction TB
I1["User prompt"]
I2["CLAUDE.md / Memory"]
I3["Conversation history"]
I4["Tool definitions"]
I5["Skill descriptions"]
end
subgraph P["🧠 PROCESS"]
direction TB
P1["LLM Inference"]
P2["ReAct Loop"]
end
subgraph O["📤 OUTPUT"]
direction TB
O1["Text response"]
O2["File changes"]
O3["Shell commands"]
O4["Agent results"]
end
I ==>|"context window"| P ==>|"generation"| O
O -.->|"tool results feed back"| I
style I fill:#d4edda,stroke:#28a745,stroke-width:2px
style P fill:#e8daef,stroke:#8e44ad,stroke-width:2px
style O fill:#d1ecf1,stroke:#17a2b8,stroke-width:2pxReAct Loop
---
config:
theme: neutral
look: classic
---
flowchart LR
A(["Input"]):::green --> B["LLM Inference"]:::purple --> C["PreToolUse Hook"]:::blue --> D["Tool Execution"]:::cyan --> E(["Observation"]):::orange
E -.->|"loop until done"| A
classDef green fill:#d4edda,stroke:#28a745,color:#1a1a1a
classDef purple fill:#e8daef,stroke:#8e44ad,color:#1a1a1a
classDef blue fill:#cce5ff,stroke:#004085,color:#1a1a1a
classDef cyan fill:#d1ecf1,stroke:#0c5460,color:#1a1a1a
classDef orange fill:#ffeeba,stroke:#856404,color:#1a1a1a
linkStyle 4 stroke:#e07070,stroke-width:2px,stroke-dasharray:502. AI Agent Brain — 3 Components
---
config:
theme: neutral
look: classic
---
flowchart TB
LLM["🧠 LLM — The Brain
Reasoning · Planning · Judgment
Code generation · Tool selection"]:::purple
CTX["📋 Context Window — Working Memory
1M tokens[^6] · auto-compact at ~95%[^2]
System Prompt + Tools + History + Results"]:::blue
MEM["💾 Memory — Long-term Storage
CLAUDE.md · MEMORY.md · Topic files
Load on-demand into context"]:::orange
LLM <-->|"inference reads
from context"| CTX
MEM -->|"load on-demand"| CTX
classDef purple fill:#e8daef,stroke:#8e44ad,stroke-width:2px,color:#1a1a1a
classDef blue fill:#cce5ff,stroke:#004085,stroke-width:2px,color:#1a1a1a
classDef orange fill:#ffeeba,stroke:#856404,stroke-width:2px,color:#1a1a1aHuman Brain ↔ AI Agent
| Human Brain | AI Agent | |
|---|---|---|
| Neurons + Synapses | ↔ | LLM (weights + inference) |
| Working Memory (7±2 items) | ↔ | Context Window (token limit) |
| Long-term Memory | ↔ | Memory (CLAUDE.md, MEMORY.md) |
| Hands / Eyes / Mouth | ↔ | Tools (Read, Edit, Bash...) |
| Habits / Reflexes | ↔ | Hooks (auto event triggers) |
| Delegation to others | ↔ | Sub-agents (isolated workers) |
WHY CONTEXT WINDOW MATTERS
Context window là nút cổ chai trung tâm của mọi AI agent. Mọi cơ chế mở rộng đều nhằm quản lý hiệu quả tài nguyên này: Tools thực thi ngoài context nhưng kết quả phải quay về. Skills chỉ load mô tả ngắn (budget 1% context window, mỗi skill ≤250 chars)[^10], full load on-demand. MCP defer tool schemas qua ToolSearch[^8] — chỉ tên tool trong context cho đến khi cần. Sub-agents cách ly context riêng[^7], chỉ trả summary. Hooks chạy ngoài context — command hooks không tốn token, nhưng output inject vào context thì có (≤10K chars)[^9].
Context Window Allocation (~1M tokens)
---
config:
theme: base
themeVariables:
pie1: "#8e44ad"
pie2: "#d4a017"
pie3: "#2874a6"
pie4: "#5060c0"
pie5: "#148f77"
pie6: "#bdc3c7"
---
pie title What fills the context window
"System prompt" : 12
"CLAUDE.md + Memory" : 8
"Tool schemas" : 6
"Skill descriptions" : 5
"Conversation + Results" : 35
"Available space" : 34At ~95% usage → auto-compact: summarize old history, preserve recent[^2]
03. Evolution — How Each Mechanism Evolved
Track 1: Actions — Tool → MCP
---
config:
theme: neutral
look: classic
---
flowchart LR
A["Hardcoded
Functions"]:::dim -->|"standardize"| B["Function
Calling"]:::dim -->|"ship built-in"| C["Built-in
Tools"]:::green -->|"open protocol"| D["🌐 MCP"]:::cyan
classDef dim fill:#f0f0f0,stroke:#999,color:#666
classDef green fill:#d4edda,stroke:#28a745,color:#1a1a1a
classDef cyan fill:#d1ecf1,stroke:#0c5460,color:#1a1a1a,stroke-width:2pxTrack 2: Workflows — Command → Skill
---
config:
theme: neutral
look: classic
---
flowchart LR
A["Built-in
/commands"]:::dim -->|"customizable"| B["Custom
Commands"]:::yellow -->|"structured
+ proactive"| C["📘 Skills"]:::accent
classDef dim fill:#f0f0f0,stroke:#999,color:#666
classDef yellow fill:#fff3cd,stroke:#856404,color:#1a1a1a
classDef accent fill:#d6d9f0,stroke:#5060c0,color:#1a1a1a,stroke-width:2pxTrack 3: Context — Single → Multi-Agent
---
config:
theme: neutral
look: classic
---
flowchart LR
A["Single
Context"]:::dim -->|"auto-manage"| B["Auto-
Compaction"]:::orange -->|"isolate"| C["🤖 Sub-Agents"]:::red
classDef dim fill:#f0f0f0,stroke:#999,color:#666
classDef orange fill:#ffeeba,stroke:#856404,color:#1a1a1a
classDef red fill:#f8d7da,stroke:#721c24,color:#1a1a1a,stroke-width:2pxTrack 4: Automation — Manual → Hooks
---
config:
theme: neutral
look: classic
---
flowchart LR
A["Manual
Only"]:::dim -->|"persistent"| B["CLAUDE.md"]:::orange -->|"deterministic"| C["⚙ Hooks"]:::purple
classDef dim fill:#f0f0f0,stroke:#999,color:#666
classDef orange fill:#ffeeba,stroke:#856404,color:#1a1a1a
classDef purple fill:#e8daef,stroke:#8e44ad,color:#1a1a1a,stroke-width:2pxAll 4 Tracks Converge → Plugin
---
config:
theme: neutral
look: classic
---
flowchart TB
T["🌐 MCP"]:::cyan
S["📘 Skills"]:::accent
A["🤖 Sub-Agents"]:::red
H["⚙ Hooks"]:::purple
T & S & A & H --> PL(["📦 Plugin Package"]):::dim
classDef cyan fill:#d1ecf1,stroke:#0c5460,color:#1a1a1a,stroke-width:2px
classDef accent fill:#d6d9f0,stroke:#5060c0,color:#1a1a1a,stroke-width:2px
classDef red fill:#f8d7da,stroke:#721c24,color:#1a1a1a,stroke-width:2px
classDef purple fill:#e8daef,stroke:#8e44ad,color:#1a1a1a,stroke-width:2px
classDef dim fill:#e9ecef,stroke:#6c757d,color:#1a1a1a,stroke-width:3px04. Sub-Agent — Context Isolation
---
config:
theme: neutral
look: classic
---
flowchart TD
P["🖥 Parent Session
context: task + summaries"]:::accent
subgraph agents["Sub-Agents — isolated context each"]
direction LR
E["🔍 Explore
haiku · read-only"]:::red
G["⚡ General
inherit · full tools"]:::red
C["🎯 Custom
plugin-defined"]:::red
W["🌲 Worktree
git isolation"]:::red
end
R(["✅ Summary
~500 tok ÷40x"]):::green
P -->|"spawn + brief"| agents
agents -->|"summary only"| R
R -->|"result"| P
classDef accent fill:#d6d9f0,stroke:#5060c0,color:#1a1a1a,stroke-width:2px
classDef red fill:#f8d7da,stroke:#721c24,color:#1a1a1a
classDef green fill:#d4edda,stroke:#28a745,color:#1a1a1a,stroke-width:2px
style agents fill:#fff5f5,stroke:#c0392b,stroke-width:1px,stroke-dasharray:5Context savings: 20K work → 500 tok result = ~40x reduction
- Problem: Grep 40 files = 20,000 tokens vào context → loãng reasoning.
- Solution: Agent có context riêng, trả summary only. Parent chỉ tăng 500 tokens.
- Trade-off: 1.2x-25x API cost, nhưng bảo toàn context quality.
05. Full Architecture Stack
---
config:
theme: neutral
look: classic
---
flowchart TB
subgraph L1["🖥 USER INTERFACE"]
direction LR
U1["Terminal"] ~~~ U2["VS Code"] ~~~ U3["JetBrains"] ~~~ U4["claude.ai"] ~~~ U5["Desktop"]
end
subgraph L2["⚙ AGENT HARNESS"]
direction LR
H1["ReAct Loop"] ~~~ H2["Permissions"] ~~~ H3["Hook Engine"] ~~~ H4["Compaction"] ~~~ H5["Plugin Loader"]
end
subgraph L3["📋 CONTEXT WINDOW"]
direction LR
C1["System Prompt"] ~~~ C2["CLAUDE.md"] ~~~ C3["Tool Schemas"] ~~~ C4["Skills Meta"] ~~~ C5["Conv + Results"]
end
subgraph L4["🛠 CAPABILITIES"]
direction LR
T1["Read/Edit/Write"] ~~~ T2["Bash/Glob/Grep"] ~~~ T3["MCP Servers"] ~~~ T4["Skill tool"] ~~~ T5["Agent tool"]
end
subgraph L5["💾 PERSISTENCE"]
direction LR
P1["Context"] ~~~ P2["CLAUDE.md"] ~~~ P3["Memory files"] ~~~ P4["Git history"] ~~~ P5["External"]
end
subgraph L6["🤖 MULTI-AGENT"]
direction LR
A1["Explore"] ~~~ A2["General"] ~~~ A3["Plan"] ~~~ A4["Custom"] ~~~ A5["Worktree"] ~~~ A6["Teams"]
end
subgraph L7["🧠 LLM FOUNDATION"]
direction LR
M1["Opus 4.6"] ~~~ M2["Sonnet 4.6"] ~~~ M3["Haiku 4.5"] ~~~ M4["Prompt Cache"]
end
L1 --- L2 --- L3 --- L4 --- L5 --- L6 --- L7
style L1 fill:#d1ecf1,stroke:#0c5460,stroke-width:2px
style L2 fill:#e8daef,stroke:#8e44ad,stroke-width:2px
style L3 fill:#cce5ff,stroke:#004085,stroke-width:2px
style L4 fill:#d4edda,stroke:#28a745,stroke-width:2px
style L5 fill:#ffeeba,stroke:#856404,stroke-width:2px
style L6 fill:#f8d7da,stroke:#721c24,stroke-width:2px
style L7 fill:#e8daef,stroke:#8e44ad,stroke-width:2px06. Plugin — The Distribution Unit
---
config:
theme: neutral
look: classic
---
flowchart TB
subgraph PL["📦 PLUGIN PACKAGE"]
direction LR
SK["📘 skills/
SKILL.md + refs"]:::accent
AG["🤖 agents/
agent-name.md"]:::red
CM["📝 commands/
legacy → skills"]:::yellow
HK["⚙ hooks/
hooks.json"]:::purple
MC["🌐 .mcp.json
server config"]:::cyan
end
classDef accent fill:#d6d9f0,stroke:#5060c0,color:#1a1a1a
classDef red fill:#f8d7da,stroke:#721c24,color:#1a1a1a
classDef yellow fill:#fff3cd,stroke:#856404,color:#1a1a1a
classDef purple fill:#e8daef,stroke:#8e44ad,color:#1a1a1a
classDef cyan fill:#d1ecf1,stroke:#0c5460,color:#1a1a1a
style PL fill:#f8f9fa,stroke:#6c757d,stroke-dasharray:5,stroke-width:2pxNATURAL EVOLUTION
- Tool → MCP — chuẩn hoá protocol, giải N×M problem
- Command → Skill — workflow phức tạp + Claude tự kích hoạt
- Single context → Sub-agent — cách ly context, parallel execution
- Instructions → Hook — deterministic, chạy 100% khi event fire
07. Summary — Everything Manages Context
| Mechanism | Problem Solved | Context Cost | Lineage |
|---|---|---|---|
| Built-in Tool | LLM cần hành động ngoài text | Schema tĩnh + results tích luỹ | core primitive |
| MCP[^3] | N×M integration, portability | ToolSearch defer[^8], load on-demand | Tool → standardized |
| Command | Reusable prompts | Full content inject khi gõ | core primitive |
| Skill[^10] | Complex workflows + auto-activate | 1% context / 8K chars, on-demand | Command → proactive |
| Hook[^9] | Deterministic automation | Cmd hooks: zero. Inject: ≤10K | CLAUDE.md → deterministic |
| Sub-Agent[^7] | Context pollution, parallel exec | ~500 tok / 20K work | Context → isolated |
| Plugin | Distribution + packaging | Bundles all above | meta-unit |
How Everything Connects
---
config:
theme: base
themeVariables:
primaryColor: "#d6d9f0"
primaryTextColor: "#1a1a1a"
primaryBorderColor: "#5060c0"
lineColor: "#666"
secondaryColor: "#d4edda"
tertiaryColor: "#d1ecf1"
clusterBkg: "#f8f9fa"
clusterBorder: "#5060c0"
---
flowchart TB
USER(["👤 User"])
USER -->|"/slash"| CMD["📝 Command"]:::yellow
USER -->|"prompt"| CW
subgraph SESSION["🖥 Claude Code Session"]
LLM["🧠 LLM"]:::purple
CW["📋 Context Window"]:::blue
LLM <--> CW
end
CMD -->|"inject prompt"| CW
CW -->|"Skill tool"| SKILL["📘 Skill"]:::accent
SKILL -->|"inject workflow"| CW
CW -->|"tool call"| TOOL["🛠 Built-in Tool"]:::green
TOOL -->|"result"| CW
CW -->|"MCP call"| MCP["🌐 MCP Server"]:::cyan
MCP -->|"result"| CW
CW -->|"Agent tool"| SUB["🤖 Sub-Agent"]:::red
SUB -->|"summary only"| CW
HOOK["⚙ Hook"]:::purple2
HOOK -->|"auto-fire"| SESSION
PLUGIN["📦 Plugin"]
PLUGIN -.->|"provides"| SKILL & SUB & HOOK & MCP
classDef yellow fill:#fff3cd,stroke:#856404,color:#1a1a1a
classDef purple fill:#e8daef,stroke:#8e44ad,color:#1a1a1a
classDef blue fill:#cce5ff,stroke:#004085,color:#1a1a1a,stroke-width:2px
classDef green fill:#d4edda,stroke:#28a745,color:#1a1a1a
classDef cyan fill:#d1ecf1,stroke:#0c5460,color:#1a1a1a
classDef accent fill:#d6d9f0,stroke:#5060c0,color:#1a1a1a
classDef red fill:#f8d7da,stroke:#721c24,color:#1a1a1a
classDef purple2 fill:#e8daef,stroke:#8e44ad,color:#1a1a1a
style SESSION fill:#f8f9fa,stroke:#5060c0,stroke-width:2px
style PLUGIN fill:#e9ecef,stroke:#6c757d,stroke-dasharray:5
style USER fill:#fff,stroke:#666THE UNIFIED PRINCIPLE
Mọi cơ chế mở rộng của Claude Code đều phục vụ một mục tiêu: tối đa hoá chất lượng reasoning trong giới hạn context window.[^11]
Context window là tài nguyên quý nhất. Mọi thiết kế đều xoay quanh nó.
08. References
Academic Papers
[^1]: Yao, S. et al. (2023). ReAct: Synergizing Reasoning and Acting in Language Models. ICLR 2023. arXiv:2210.03629
Official Documentation
[^2]: Claude Code — Auto-compaction threshold ~95%. anthropics/claude-code#15719
[^3]: Model Context Protocol — Architecture: JSON-RPC 2.0, stdio + Streamable HTTP, 3 server primitives (Tools, Resources, Prompts). Launched November 25, 2024. modelcontextprotocol.io
[^6]: Anthropic (2025). Claude Opus 4.6 & Sonnet 4.6 — 1M token context window. platform.claude.com
[^7]: Claude Code — Sub-agents: isolated context, no nesting. code.claude.com/sub-agents
[^8]: Claude Code — MCP tool definitions deferred by default, loaded on-demand via ToolSearch. code.claude.com
[^9]: Claude Code — Hooks: output injected into context capped at 10,000 characters. code.claude.com/hooks
[^10]: Claude Code — Skills: budget 1% context window (fallback 8K chars), each entry ≤250 chars. code.claude.com/skills
[^11]: Claude Code — Agent loop: gather context → take action → verify results. code.claude.com
[^12]: Anthropic (2025). Prompt Caching — cache read tokens at 0.1x base input price. platform.claude.com
CT AI Blog — Cập nhật 04/2026