docs: mark Phase 7 Headroom integration plan as completed

2026-06-07 18:21:27 +08:00 · 2026-06-07 18:21:27 +08:00 · 3645c7a080
parent bad66445ff
commit 3645c7a080
1 changed files with 344 additions and 0 deletions
--- a/docs/plans/2026-06-07-013-feat-agentkit-phase7-headroom-plan.md
+++ b/docs/plans/2026-06-07-013-feat-agentkit-phase7-headroom-plan.md
@ -0,0 +1,344 @@
+---
+title: "feat: AgentKit Phase 7 — Headroom 上下文压缩集成"
+status: completed
+created: 2026-06-07
+plan_type: feat
+depth: standard
+origin: Phase 6 完成后 Headroom 集成评估 + GEO Pipeline token 成本优化需求
+branch: feat/agentkit-phase7-headroom
+---
+
+# AgentKit Phase 7 — Headroom 上下文压缩集成
+
+## Summary
+
+在 ReAct 引擎中集成 Headroom 作为上下文压缩层，在工具输出拼装到对话历史前进行智能压缩，减少 60-90% token 消耗。采用 Library 模式集成，作为可选依赖默认关闭，通过 YAML 配置开关启用。定义 CompressionStrategy Protocol 使现有 ContextCompressor 和新 HeadroomCompressor 可互换，扩展 ReAct 循环内压缩点实现增量压缩。
+
+## Problem Frame
+
+Phase 6 完成后，AgentKit 的工具生态（WebCrawl、BaiduSearch、Schema 工具）产生大量工具输出，这些输出是 GEO Pipeline token 消耗的主要来源。当前 ContextCompressor 仅在初始消息构建时做一次 LLM 摘要式压缩，ReAct 循环内工具结果累积后不再压缩，导致长对话 token 膨胀严重。
+
+Headroom 提供 6 种压缩算法（SmartCrusher/CodeCompressor/Kompress/CacheAligner/IntelligentContext/ImageCompressor），按内容类型智能路由，CCR 可逆压缩保证原始数据不丢失。集成后可在不改变 Agent 行为的前提下大幅降低 API 成本。
+
+## Requirements
+
+- R1: Headroom 集成后，ReAct 循环内工具输出在拼装到对话历史前被压缩
+- R2: 压缩是可选的，默认关闭，通过 YAML 配置启用
+- R3: Headroom 未安装时系统正常工作，自动降级到现有 ContextCompressor
+- R4: CCR 可逆压缩：LLM 可通过 headroom_retrieve 工具取回原始数据
+- R5: 压缩策略可配置：全局开关、内容类型路由、压缩强度
+- R6: 不引入 PyTorch 等重型依赖，headroom-ai[code] 为最大可选安装范围
+- R7: 增量压缩：ReAct 循环内每步工具结果独立压缩，而非仅初始一次
+
+## Key Technical Decisions
+
+### KTD-1: CompressionStrategy Protocol 替代继承
+
+**决策**: 定义 `CompressionStrategy` Protocol（`async def compress(messages) -> list[dict]`），而非让 HeadroomCompressor 继承 ContextCompressor。
+
+**理由**: ContextCompressor 是具体类，内部硬编码了 LLM 摘要逻辑，不适合作为基类。Protocol 允许两种压缩策略独立演化，ReActEngine 只依赖 Protocol 接口。
+
+**替代方案**: 让 HeadroomCompressor 继承 ContextCompressor 并 override compress() — 耦合度高，ContextCompressor 内部状态（llm_gateway, max_tokens）对子类无意义。
+
+### KTD-2: Library 模式集成，不用 Proxy/MCP Server
+
+**决策**: 使用 `from headroom import compress` Library 模式在进程内调用。
+
+**理由**: AgentKit 是框架不是终端工具，需要在 ReAct 循环内精确控制压缩时机（工具结果构建后、LLM 调用前）。Proxy 模式无法区分哪些消息需要压缩，MCP Server 模式增加了网络开销和额外进程管理。
+
+### KTD-3: 不引入 Kompress-base 模型
+
+**决策**: 仅使用 SmartCrusher（JSON）和 CodeCompressor（代码），不使用 Kompress-base（文本压缩模型）。
+
+**理由**: Kompress-base 依赖 HuggingFace Transformers + PyTorch，安装体积约 2GB。AgentKit 的文本压缩需求（对话历史摘要）由现有 ContextCompressor 的 LLM 摘要模式覆盖。Headroom 的 SmartCrusher 对 JSON 工具输出效果最佳（92% 压缩率）。
+
+### KTD-4: 工具结果压缩 + 对话历史压缩双层架构
+
+**决策**: 新增 `compress_tool_result()` 方法处理单个工具输出（SmartCrusher/CodeCompressor），保留 `compress()` 处理整段对话历史（现有 ContextCompressor 逻辑）。
+
+**理由**: 工具输出和对话历史的压缩策略不同 — 工具输出是结构化数据（JSON/代码），适合 Headroom 的统计压缩；对话历史是混合内容，适合 LLM 摘要。双层架构让两种策略各司其职。
+
+### KTD-5: CCR 检索工具自动注册
+
+**决策**: 当 HeadroomCompressor 启用时，自动注册 `headroom_retrieve` 工具到 ToolRegistry，LLM 可通过 Function Calling 取回原始数据。
+
+**理由**: CCR 的核心价值是可逆性 — 压缩后 LLM 仍可按需取回原始数据。将 retrieve 暴露为工具是最自然的集成方式，LLM 在需要详细信息时会自动调用。
+
+---
+
+## Implementation Units
+
+### U1. CompressionStrategy Protocol 与工厂函数
+
+**Goal**: 定义压缩策略 Protocol 接口，实现工厂函数根据配置创建压缩器实例。
+
+**Dependencies**: 无
+
+**Files**:
+- `src/agentkit/core/compressor.py` — 修改：新增 CompressionStrategy Protocol，新增 create_compressor() 工厂函数
+- `tests/unit/test_compression_strategy.py` — 新增：Protocol 合规性测试 + 工厂函数测试
+
+**Approach**:
+1. 在 compressor.py 中定义 `CompressionStrategy` Protocol：
+   - `async def compress(self, messages: list[dict]) -> list[dict]`
+   - `async def compress_tool_result(self, tool_name: str, result: Any) -> str`
+   - `def is_available(self) -> bool`
+2. 让现有 `ContextCompressor` 实现该 Protocol（添加 `compress_tool_result` 方法，默认返回 `str(result)`）
+3. 新增 `create_compressor(config: dict | None = None) -> CompressionStrategy | None` 工厂函数：
+   - config 为 None 或空 → 返回 None（不压缩）
+   - config.provider == "headroom" 且 headroom-ai 已安装 → 返回 HeadroomCompressor
+   - config.provider == "headroom" 但未安装 → 警告并降级到 ContextCompressor
+   - config.provider == "summary" 或默认 → 返回 ContextCompressor
+
+**Patterns to follow**: `src/agentkit/telemetry/setup.py` 的 setup_telemetry() 模式 — 配置驱动 + ImportError 降级
+
+**Test scenarios**:
+- ContextCompressor 满足 CompressionStrategy Protocol（isinstance 检查）
+- create_compressor(None) 返回 None
+- create_compressor({"provider": "summary"}) 返回 ContextCompressor 实例
+- create_compressor({"provider": "headroom"}) 在 headroom-ai 未安装时降级到 ContextCompressor 并记录警告
+- create_compressor({"provider": "headroom"}) 在 headroom-ai 已安装时返回 HeadroomCompressor 实例
+- ContextCompressor.compress_tool_result() 默认返回 str(result)
+
+**Verification**: 所有测试通过，Protocol 接口可被 mypy 检查
+
+---
+
+### U2. HeadroomCompressor 实现
+
+**Goal**: 实现 HeadroomCompressor 类，封装 headroom-ai Library 模式 API，支持工具输出压缩和 CCR 检索。
+
+**Dependencies**: U1
+
+**Files**:
+- `src/agentkit/core/headroom_compressor.py` — 新增：HeadroomCompressor 类
+- `src/agentkit/core/__init__.py` — 修改：导出 CompressionStrategy, create_compressor, HeadroomCompressor
+- `tests/unit/test_headroom_compressor.py` — 新增：HeadroomCompressor 完整测试
+
+**Approach**:
+1. 模块级 `_HEADROOM_AVAILABLE` 标志（参照 Crawl4AI 模式）
+2. `HeadroomCompressor` 类实现 CompressionStrategy Protocol：
+   - `__init__(config: dict)` — 接收压缩配置（compressors 列表、ccr_ttl、model 等）
+   - `compress(messages)` — 对 messages 中 role=tool 的消息调用 headroom.compress()，其他消息原样保留
+   - `compress_tool_result(tool_name, result)` — 根据内容类型路由到 SmartCrusher/CodeCompressor，返回压缩文本 + CCR 哈希
+   - `is_available()` → `_HEADROOM_AVAILABLE`
+   - `retrieve(ccr_hash: str, query: str)` → 从 CCR 缓存取回原始数据
+3. 内容类型路由逻辑：
+   - 检测 result 是否为 JSON（try json.loads）→ SmartCrusher
+   - 检测是否为代码（常见代码模式匹配）→ CodeCompressor
+   - 其他 → 不压缩，原样返回
+4. CCR 哈希附加格式：`[compressed content]\n<!-- CCR:hash=abc123 -->`
+5. 配置项：
+   - `enabled: bool` — 开关
+   - `provider: "headroom"` — 标识
+   - `compressors: ["smart_crusher", "code_compressor"]` — 启用的压缩器
+   - `ccr_ttl: int` — CCR 缓存 TTL（秒），默认 300
+   - `min_length: int` — 最小压缩长度（字符），短于此不压缩，默认 500
+   - `model: str` — 传给 headroom 的模型名，用于 token 估算
+
+**Patterns to follow**: `src/agentkit/tools/web_crawl.py` 的 _CRAWL4AI_AVAILABLE 降级模式
+
+**Test scenarios**:
+- HeadroomCompressor 未安装 headroom-ai 时 is_available() 返回 False
+- compress() 对 role=tool 消息压缩，其他消息原样保留
+- compress_tool_result() 对 JSON 内容使用 SmartCrusher
+- compress_tool_result() 对代码内容使用 CodeCompressor
+- compress_tool_result() 对短内容（< min_length）不压缩
+- compress_tool_result() 返回的压缩文本包含 CCR 哈希
+- retrieve() 可通过 CCR 哈希取回原始数据
+- compress() 在 headroom-ai 未安装时静默返回原消息（不抛异常）
+- 配置项正确传递给 headroom API
+
+**Verification**: 所有测试通过，headroom-ai 未安装时测试也能通过（mock 或跳过）
+
+---
+
+### U3. ReAct 引擎压缩点扩展
+
+**Goal**: 在 ReAct 循环内新增工具结果压缩和增量压缩调用点。
+
+**Dependencies**: U1
+
+**Files**:
+- `src/agentkit/core/react.py` — 修改：扩展 compressor 使用点
+- `tests/unit/test_react_compression.py` — 新增：ReAct 循环内压缩测试
+
+**Approach**:
+1. `_build_tool_result_message` 方法增加 compressor 参数：
+   - 有 compressor 时调用 `compressor.compress_tool_result(tool_name, result)` 获取压缩内容
+   - 无 compressor 时保持原逻辑 `str(result)`
+2. `_execute_loop` 和 `execute_stream` 中传递 compressor 到 `_build_tool_result_message`
+3. while 循环内每步 LLM 调用前，检查 conversation 是否超过 token 预算，超过则调用 `compressor.compress(conversation)` 增量压缩
+4. 新增 `_should_compress(conversation, compressor)` 辅助方法：估算当前 conversation token 数，超过阈值时返回 True
+
+**Patterns to follow**: 现有 `compressor.compress(conversation)` 调用模式（L218-222）
+
+**Test scenarios**:
+- _build_tool_result_message 无 compressor 时行为不变
+- _build_tool_result_message 有 compressor 时调用 compress_tool_result
+- ReAct 循环内工具结果被压缩后拼入 conversation
+- 长对话触发增量压缩（conversation 超过 token 预算时）
+- 短对话不触发增量压缩
+- execute_stream 模式下压缩正常工作
+- compressor.compress() 异常时不影响 ReAct 循环（try/except 保护）
+
+**Verification**: ReAct 循环内压缩测试通过，现有 ReAct 测试不受影响
+
+---
+
+### U4. 配置集成与 Agent 注入
+
+**Goal**: 在 ServerConfig 中新增 compression 配置，在 ConfigDrivenAgent 中实例化并注入 compressor。
+
+**Dependencies**: U1, U2, U3
+
+**Files**:
+- `src/agentkit/server/config.py` — 修改：ServerConfig 新增 compression 字段
+- `src/agentkit/server/app.py` — 修改：create_app 中创建 compressor 并注入
+- `src/agentkit/core/config_driven.py` — 修改：ConfigDrivenAgent 传递 compressor 给 ReActEngine
+- `configs/agentkit.example.yaml` — 修改：新增 compression 配置示例
+- `tests/unit/test_compression_config.py` — 新增：配置集成测试
+
+**Approach**:
+1. ServerConfig.__init__ 新增 `compression: dict[str, Any] | None = None`
+2. from_dict 中提取 `data.get("compression", {})`
+3. _try_reload_config 中同步更新 compression 字段
+4. create_app 中：
+   - 调用 `create_compressor(server_config.compression)` 创建压缩器
+   - 存入 `app.state.compressor`
+   - 传递给 AgentPool
+5. ConfigDrivenAgent.__init__ 接收 compressor 参数
+6. ConfigDrivenAgent._handle_react 传递 compressor 给 ReActEngine.execute()
+
+**YAML 配置示例**:
+```yaml
+compression:
+  enabled: true
+  provider: headroom       # "headroom" | "summary" | None
+  compressors:
+    - smart_crusher
+    - code_compressor
+  ccr_ttl: 300
+  min_length: 500
+  model: default
+```
+
+**Patterns to follow**: `src/agentkit/server/config.py` 中 telemetry 配置模式
+
+**Test scenarios**:
+- ServerConfig 解析 compression 配置
+- compression 为空时 create_compressor 返回 None
+- compression.provider=headroom 且已安装时创建 HeadroomCompressor
+- compression.provider=headroom 且未安装时降级到 ContextCompressor
+- create_app 正确注入 compressor 到 app.state
+- ConfigDrivenAgent 传递 compressor 给 ReActEngine
+- 配置热重载时 compression 字段同步更新
+- agentkit.yaml 中无 compression 段时系统正常工作
+
+**Verification**: 端到端配置测试通过，无 compression 配置时向后兼容
+
+---
+
+### U5. CCR 检索工具注册
+
+**Goal**: 当 HeadroomCompressor 启用时，自动注册 headroom_retrieve 工具到 ToolRegistry。
+
+**Dependencies**: U2, U4
+
+**Files**:
+- `src/agentkit/tools/headroom_retrieve.py` — 新增：HeadroomRetrieveTool
+- `src/agentkit/tools/__init__.py` — 修改：条件导出
+- `src/agentkit/server/app.py` — 修改：条件注册 headroom_retrieve 工具
+- `tests/unit/test_headroom_retrieve_tool.py` — 新增：检索工具测试
+
+**Approach**:
+1. 新增 `HeadroomRetrieveTool(Tool)` 类：
+   - name: "headroom_retrieve"
+   - description: "Retrieve original uncompressed data from CCR cache by hash or query"
+   - input_schema: `{ccr_hash: str, query: str}`（至少一个）
+   - execute: 调用 `compressor.retrieve(ccr_hash, query)` 返回原始数据
+2. 在 create_app 中，当 compressor 是 HeadroomCompressor 实例时，创建并注册 HeadroomRetrieveTool
+3. HeadroomRetrieveTool 持有 compressor 引用，execute 时调用 compressor.retrieve()
+4. headroom-ai 未安装时不注册此工具
+
+**Patterns to follow**: `src/agentkit/tools/baidu_search.py` 的 Tool 实现模式
+
+**Test scenarios**:
+- HeadroomRetrieveTool 构造和属性
+- execute 传入 ccr_hash 返回原始数据
+- execute 传入 query 返回匹配数据
+- execute 传入无效 hash 返回错误信息
+- headroom-ai 未安装时工具不注册
+- 非 HeadroomCompressor 时工具不注册
+- 工具 schema 正确（name, description, input_schema）
+
+**Verification**: 工具注册和检索功能测试通过
+
+---
+
+### U6. GEO Pipeline 压缩验证与文档
+
+**Goal**: 验证 GEO Pipeline 在 Headroom 压缩下的端到端工作，更新配置文档。
+
+**Dependencies**: U1, U2, U3, U4, U5
+
+**Files**:
+- `tests/integration/test_geo_compression.py` — 新增：GEO Pipeline 压缩集成测试
+- `configs/agentkit.example.yaml` — 修改：完整 compression 配置示例
+
+**Approach**:
+1. 编写 GEO Pipeline 端到端压缩测试：
+   - 启用 Headroom 压缩执行完整 7 步 GEO Pipeline
+   - 验证每步工具输出被压缩
+   - 验证 CCR 检索可取回原始数据
+   - 验证最终输出质量不受压缩影响
+2. 对比测试：同一任务压缩 vs 不压缩的 token 消耗
+3. 更新 agentkit.example.yaml 添加完整 compression 配置段和注释
+
+**Test scenarios**:
+- GEO Pipeline 启用压缩后端到端执行成功
+- 工具输出（baidu_search, web_crawl, schema_extract, schema_generate）被压缩
+- headroom_retrieve 可取回原始搜索结果
+- 压缩后 Pipeline 输出与不压缩时语义一致
+- compression.enabled=false 时 Pipeline 行为与之前完全一致
+
+**Verification**: 集成测试通过，配置文档完整
+
+---
+
+## Scope Boundaries
+
+### In Scope
+- CompressionStrategy Protocol 定义和工厂函数
+- HeadroomCompressor 实现（SmartCrusher + CodeCompressor）
+- ReAct 循环内工具结果压缩和增量压缩
+- ServerConfig compression 配置
+- CCR headroom_retrieve 工具
+- GEO Pipeline 压缩验证
+
+### Deferred to Follow-Up Work
+- Kompress-base 文本压缩模型集成（需 PyTorch，体积过大）
+- CacheAligner KV Cache 前缀稳定化（需深入理解各 LLM Provider 的缓存机制）
+- 压缩效果 A/B 测试框架（需真实 API 调用对比，属于产品验证范畴）
+- 跨 Agent 共享压缩上下文（Headroom SharedContext，需多 Agent 架构先就绪）
+- 压缩指标 Dashboard（需 Grafana/Prometheus 集成，属于运维范畴）
+- headroom learn 自学习优化（需长期运行数据积累）
+
+---
+
+## Risks & Dependencies
+
+| 风险 | 影响 | 缓解 |
+|------|------|------|
+| headroom-ai Beta 版本 API 可能 break | 压缩功能失效 | 锁定 minor 版本 `>=0.22,<0.23`；try/except 保护所有调用 |
+| SmartCrusher 对 GEO 结构化数据过度压缩 | 引用检测丢失关键字段 | min_length 阈值 + CCR 可逆 + 默认关闭 |
+| 压缩增加延迟 | ReAct 循环变慢 | Headroom 本地运行毫秒级延迟；异步调用 |
+| ConfigDrivenAgent 修改影响现有 Agent | 回归 | compressor 默认 None，向后兼容测试 |
+| CCR 缓存内存占用 | 长时间运行内存膨胀 | ccr_ttl 默认 300 秒，LRU 淘汰 |
+
+---
+
+## Open Questions
+
+- headroom-ai 的 compress() 是否为 async？若为 sync，需用 asyncio.to_thread() 包装 — 实现时验证
+- SmartCrusher 对中文 JSON 的压缩效果如何？需实际测试 — 延迟到 U6 集成验证