28 KiB

Raw Blame History

date	plan_id	type	status	title	origin
2026-07-02	2026-07-02-003	feat	draft	Context Compressor CJK Token 估算 + Prefix 对齐 + 压缩简化	stash@{0} (feat/context-compressor-enhancement, 2026-06-23)

Context Compressor CJK Token 估算 + Prefix 对齐 + 压缩简化

Summary

当前 ContextCompressor 在中文会话场景下存在三个实际问题： (1) token 估算用 len(content) // 4（4 字符=1 token），但 CJK 字符实际 1 字符 ≈ 1 token，导致中文会话 token 被低估 4 倍，压缩触发过晚，可能超出模型 context window； (2) system prompt 混有动态内容（时间戳、UUID、session_id），破坏 LLM provider（Anthropic/OpenAI）的 KV cache prefix 稳定性，导致 cache miss → 延迟增加 + 成本上升； (3) compress() 用递归 _compression_depth 参数，逻辑复杂且缺乏结构化日志，难以监控压缩行为。

本 plan 基于 stash@{0} 的 WIP 实现，重建为可执行的 implementation plan。

Problem Frame

当前状态（main 6826ceb 上的 src/agentkit/core/compressor.py）：

estimate_tokens(): len(str(content)) // 4 — 对中文严重低估
should_compress(): 基于 headroom ratio，依赖 estimate_tokens 的准确性
compress(): 递归 _compression_depth 参数，最多 2 层递归后强制 truncate
_compress_aggressive(): 也接受 _compression_depth 参数
无 prefix 对齐逻辑 — system prompt 中动态行位置每次变化
无结构化日志 — 压缩发生时无 tokens_before/after/ratio 记录

应用点：

src/agentkit/core/react.py _should_compress() (line 1705) — 已委托给 compressor.should_compress() (line 1716-1718)，// 4 fallback (line 1721) 是 live code（HeadroomCompressor 不实现 should_compress()，fallback 为其服务）
src/agentkit/core/config_driven.py _handle_direct() (line 1066) — stash 新增了 LLM 调用前压缩（len(rendered_messages) > 6 触发，但 rendered_messages 实际 ≤ 2 条）

影响：中文用户的长会话场景下，压缩触发时机过晚，可能触发模型 context limit 错误；KV cache 命中率低导致每次请求延迟和成本上升。

Requirements

R1: CJK 字符（中日韩统一表意文字 + 假名 + 韩文谚文）按 1:1 估算 token，ASCII 仍按 4:1
R2: estimate_text_tokens() 作为模块级函数导出，供 react.py 等外部调用
R3: ~~align_prefix() 将 system prompt 中的动态行（时间戳、UUID、session_id、当前时间）移到末尾 [Dynamic Context] 段，保持前缀稳定~~ （已取消 — OQ17/OQ18：_build_system_message() 已覆盖，且动态内容不存在）
R4: ~~compress() 首先调用 align_prefix()，提升 KV cache 命中率~~ （已取消 — 随 U2 取消）
R5: 移除递归 _compression_depth 参数，改线性流程：compress → aggressive → truncate（修正：移除 align 步骤，U2 已取消）
R6: 新增 _log_compression() 输出结构化日志（tokens_before/after/ratio/msg_count）
R7: react.py._should_compress() 的 // 4 fallback（line 1721）改用 estimate_text_tokens()（第三轮修正：fallback 不是死代码 — HeadroomCompressor 不实现 should_compress()，fallback 是 live code。改为用 estimate_text_tokens() 替换 // 4 而非移除，使 CJK 估算惠及 HeadroomCompressor 用户）
R8: ~~config_driven.py 在 LLM 调用前对长 system prompt 执行压缩~~ （已取消 — 第三轮 F-009：compress() 守卫使单条消息压缩无效；system prompt 是设计时 artifact）
R9: 测试覆盖 CJK 估算、~~prefix 对齐~~、压缩流程、~~工具结果压缩~~（修正：prefix 对齐和工具结果压缩随 U2/U4 缩减取消）

R9 mapping: R9 是横切需求，不映射到单一 Unit。由 U1（CJK 估算测试）、~~U2（TestAlignPrefix）~~（已取消）、U3（压缩流程测试）、~~U4（test_config_driven.py 压缩测试）~~（已取消）的 Test scenarios 与 Verification 共同满足。

Key Technical Decisions

KTD1: CJK 估算用启发式，不引入 tiktoken

决策：用 _is_cjk(char) + estimate_text_tokens(text) 启发式（CJK 1:1，ASCII 4:1），不引入 tiktoken 依赖，也不使用 litellm.token_counter。

理由（修正 OQ3：原"项目无 tiktoken 依赖"前提虚假 — litellm>=1.50 已是直接依赖 pyproject.toml:30，提供 litellm.token_counter(model, messages) API）：

~~项目无 tiktoken 依赖（pyproject.toml 未列出），引入会增加安装负担~~ 修正：litellm>=1.50 已是直接依赖，但 litellm.token_counter 需要 model 参数且需加载 tokenizer，开销大于纯字符遍历（第三轮修正：litellm.token_counter 是纯本地函数，不调用网络 API）
启发式对触发时机的判断足够准确（压缩阈值有 headroom 缓冲）
符合 ponytail 原则：用最小可行方案，避免在热路径上引入 litellm 调用开销
ponytail ceiling: 启发式对纯 CJK 文本可能仍低估 ~10-20%，但 headroom_threshold=0.8 的缓冲足以吸收；升级路径是引入 litellm.token_counter 或 provider 特定的 tokenizer

KTD2: Prefix 对齐策略 — 动态行移到末尾（已取消 — OQ17/OQ18）

~~决策：align_prefix() 识别 system prompt 中的动态行（匹配时间戳/UUID/session_id/当前时间 模式），将其移到 system prompt 末尾的 [Dynamic Context] 段落。~~

~~理由：~~

~~LLM provider 的 KV cache 基于 prefix 匹配，前缀越稳定 cache 命中率越高~~
~~动态内容（时间戳等）每次请求都变，放在前缀中会破坏 cache~~
~~移到末尾后，静态部分（agent identity、技能说明等）构成稳定前缀~~

取消原因：_build_system_message() 已实现 stable/volatile 分离；system prompt 无动态内容。详见 U2 取消说明。

KTD3: 压缩逻辑改线性，移除递归

决策：compress() 改为线性流程（~~align →~~ compress → aggressive → truncate），移除 _compression_depth 递归参数。

理由（修正 OQ20：原"递归难以追踪"理由薄弱 — 递归实际是 pseudo-linear，最多 2 层后强制 truncate）：

~~递归逻辑难以追踪和调试~~ 修正：递归虽是 pseudo-linear（max 2 层），但线性流程仍有日志可读性收益
线性流程更易添加结构化日志（_log_compression() 在每步插入日志点）
stash 已验证线性流程等价覆盖原递归的所有路径

KTD4: config_driven.py 压缩触发用消息条数（已取消 — 随 U4 取消）

~~决策：config_driven.py 用 len(rendered_messages) > 6 作为压缩触发条件（消息条数，非 token 估算）。~~

重新决策：config_driven.py 用 estimate_text_tokens(system_prompt) > max_tokens * 0.8 作为压缩触发条件（token 估算）。

取消原因：U4 已在第三轮取消（F-009 — compress() 守卫使单条消息压缩无效）。KTD4 随之失效。

Implementation Units

U1. CJK-aware token 估算

Goal: 新增 estimate_text_tokens() 模块级函数，CJK 1:1 / ASCII 4:1 估算；react.py 改用此函数。

Requirements: R1, R2, R7

Dependencies: 无

Files:

src/agentkit/core/compressor.py — 新增 _is_cjk() + estimate_text_tokens() 模块级函数；estimate_tokens() 方法内部改用 estimate_text_tokens；_summarize() line 152 的内联 // 4 也改用 estimate_text_tokens()（第三轮 NEW-F10：避免 CJK 摘要路径仍低估 4 倍导致 context overflow）
src/agentkit/core/react.py — _should_compress() 的 // 4 fallback（line 1721）改用 estimate_text_tokens()（不移除 — HeadroomCompressor 不实现 should_compress()，fallback 是 live code）
tests/unit/test_context_compressor.py — 新增 CJK 估算测试

Approach:

_is_cjk(char): 检查字符是否在 CJK 统一表意文字（U+4E00-U+9FFF）、假名（U+3040-U+30FF）、韩文谚文（U+AC00-U+D7AF）范围内
estimate_text_tokens(text): 遍历字符，CJK 计 1 token，其他按 4:1 估算
estimate_tokens(messages): 内部调用 estimate_text_tokens(str(content)) 求和
_summarize() line 152: estimated_tokens = len(conversation_text) // 4 改为 estimate_text_tokens(conversation_text)（第三轮 NEW-F10）
react.py._should_compress(): 已委托给 compressor.should_compress()（line 1716-1718）；// 4 fallback（line 1721）不是死代码 — HeadroomCompressor 不实现 should_compress()，fallback 是 live code。改为用 estimate_text_tokens() 替换 // 4，使 CJK 估算惠及 HeadroomCompressor 用户

Patterns to follow: 现有 estimate_tokens() 方法的签名和返回值约定

Test scenarios:

纯 CJK 文本：estimate_text_tokens("你好世界") == 4（1:1）
纯 ASCII：estimate_text_tokens("hello world") == 2（11 字符 / 4 = 2.75，向下取整）
混合文本：CJK 部分 1:1 + ASCII 部分 4:1
日文假名：estimate_text_tokens("こんにちは") == 5
韩文谚文：estimate_text_tokens("안녕하세요") == 5
estimate_tokens() 对包含 CJK 的消息列表给出合理估值（不再低估 4 倍）
react.py._should_compress() 对中文长会话正确触发（通过 compressor.should_compress() 委托）
_summarize() 对 CJK 文本的预截断正确触发（第三轮 NEW-F10：line 152 改用 estimate_text_tokens 后，CJK 摘要路径不再低估 4 倍）

Verification: pytest tests/unit/test_context_compressor.py -k "cjk or mixed or kana or hangul" -v 通过

U2. Prefix 对齐（已取消）

~Goal~: ~~新增 align_prefix() 方法，将 system prompt 中的动态行移到末尾，保持前缀稳定，提升 KV cache 命中率。~~

取消原因（ce-doc-review 复核第二轮 OQ17/OQ18）：

react.py:1511-1561 的 _build_system_message(stable, volatile) 已实现 stable/volatile 双块分离（Anthropic 用 cache_control: {"type": "ephemeral"}，非 Anthropic 用字符串拼接保持 stable 前缀），在 compress 调用前（line 670-674）已执行。U2 提议的 align_prefix() 与此高度重叠。

grep 当前时间/session_id/timestamp 模式在 system prompt 构造点（react.py、config_driven.py、prompts/template.py）零匹配 — PromptTemplate.render() 只做 ${var} 替换，不注入时间戳/UUID/session_id。U2 要对齐的动态内容在当前代码库中不存在。

Requirements: ~~R3, R4~~ (已随 U2 取消)

Dependencies: ~~U1（同在 compressor.py，但逻辑独立）~~

Files:

src/agentkit/core/compressor.py — 新增 align_prefix(messages) 方法
tests/unit/test_context_compressor.py — 新增 TestAlignPrefix 测试类

Approach:

align_prefix(messages): 遍历 messages，对 role == "system" 的消息：
- 识别动态行：匹配 当前时间、2026-07-02 格式时间戳、UUID（[0-9a-f]{8}-...）、session_id 等模式
- 将动态行从原位置移除，追加到 content 末尾的 [Dynamic Context] 段落
- 静态行保持原序构成稳定前缀
compress() 在执行压缩前首先调用 align_prefix()

Patterns to follow: 现有 compress() 中 system message 处理逻辑

Test scenarios:

时间戳行移到末尾：含 当前时间: 2026-07-02 的 system prompt，对齐后该行在 [Dynamic Context] 段
UUID 行移到末尾：含 session_id: abc12345-... 的 system prompt，对齐后该行在 [Dynamic Context] 段
静态 system prompt 不变：无动态行的 system prompt，align_prefix() 后内容不变
多条 system message：每条都正确对齐
非 system message 不受影响：user/assistant 消息不变

Verification: pytest tests/unit/test_context_compressor.py::TestAlignPrefix -v 通过

U3. 压缩逻辑简化 + 结构化日志

Goal: 重写 compress() 为线性流程，移除递归 _compression_depth；新增 _log_compression() 结构化日志。

Requirements: R5, R6

Dependencies: ~~U2（compress() 调用 align_prefix()）~~ （U2 已取消，U3 独立）

Files:

src/agentkit/core/compressor.py — 重写 compress()，新增 _log_compression()，重写 _compress_aggressive() 移除递归参数
tests/unit/test_context_compressor.py — 新增压缩流程测试

Approach:

compress(messages) 线性流程：
1. ~~align_prefix(messages) — 对齐 prefix~~ （U2 已取消，移除此步骤）
2. 检查 token 量，若未超阈值直接返回
3. 分离 system/old/recent，_summarize(old) 生成摘要
4. 若仍超阈值，调用 _compress_aggressive(**original messages**) — 第三轮修正 F-010：必须传入 original messages 列表（非已压缩的 compressed），避免 summary-of-summary 行为变更
5. 若仍超阈值，_truncate()
6. _log_compression() 记录压缩结果
_log_compression(tokens_before, tokens_after, msg_count, strategy) — 输出 INFO 级日志，包含压缩比
_compress_aggressive(messages) — 移除 _compression_depth 参数，改为只保留最后 1 条消息 + 摘要

Patterns to follow: 现有 compress() 的 system/old/recent 分离逻辑

Test scenarios:

短消息不压缩：estimate_tokens <= max_tokens 时返回原消息
长消息触发压缩：超过阈值时生成摘要 + 保留 recent
压缩后仍超阈值 → aggressive 压缩：只保留最后 1 条 + 摘要
aggressive 后仍超阈值 → truncate：强制截断
_log_compression 输出结构化日志（可通过 caplog 验证）
compress() 不再接受 _compression_depth 参数（签名变更）
_compress_aggressive 接收 original messages（非已压缩的 compressed），避免 summary-of-summary（第三轮 F-010）

Verification: pytest tests/unit/test_context_compressor.py -k "compress or aggressive or truncate" -v 通过

U4. config_driven.py LLM 调用前压缩 + 工具结果压缩辅助（已取消 — 第三轮 F-009）

Goal: config_driven.py 在 LLM 调用前对长 system prompt 执行压缩。

取消原因（ce-doc-review 复核第三轮 F-009/NEW-F9）：

ContextCompressor.compress() 有守卫 if len(non_system) <= keep_recent: return messages（compressor.py:98-99）。对单条 system 消息，non_system 为空列表（len=0 ≤ 默认 keep_recent=3），compress() 立即返回不压缩。U4 的核心机制根本不会触发。

rendered_messages 来自 PromptTemplate.render()，总是 ≤ 2 条（1 system + 1 user），即使压缩完整列表也无法触发 compress()。

system prompt 是 agent 设计者精心构造的设计时 artifact，运行时 LLM 摘要会降低指令质量。超长 system prompt 应在设计时修正，而非运行时压缩。

历史：第二轮已取消工具结果压缩辅助（OQ19 — HeadroomCompressor 已覆盖）。第三轮取消整个 U4（F-009 — compress() 守卫使 U4 无效）。

Approach:

config_driven.py: 在 _handle_direct() 中，LLM 调用前检查 estimate_text_tokens(system_prompt) 是否超过阈值（如 max_tokens * 0.8），若是则调用 compressor.compress([{"role": "system", "content": system_prompt}])，异常时 logger.warning 并继续
~~compressor.py 工具结果压缩~~（已取消 — 委托 HeadroomCompressor）

触发条件重新设计（OQ2/OQ5/OQ16）：原 len(rendered_messages) > 6 永远不可能为 true（PromptTemplate.render() 返回 ≤ 2 条消息）。改为基于 token 估算的单条 system prompt 压缩检查。

Patterns to follow: 现有 config_driven.py 中 self._compressor 的使用模式

Test scenarios:

config_driven.py 直接模式下长 system prompt（token 超阈值）触发压缩
config_driven.py 直接模式下短 system prompt 不压缩
压缩失败时 warning 且不阻断执行

Verification: pytest tests/unit/test_config_driven.py -k "compress" -v 通过

Scope Boundaries

In Scope

src/agentkit/core/compressor.py 的 CJK 估算 + ~~prefix 对齐~~（已取消）+ 压缩简化 + 日志 + ~~工具结果压缩辅助~~（已取消）
src/agentkit/core/react.py 的 _should_compress fallback // 4 改用 estimate_text_tokens()（fallback 是 live code，服务 HeadroomCompressor）
~~src/agentkit/core/config_driven.py 的 LLM 调用前压缩（基于 token 估算，非消息条数）~~ （已取消 — U4 取消）
tests/unit/test_context_compressor.py 测试覆盖
~~tests/unit/test_config_driven.py 压缩测试~~ （已取消 — U4 取消）

Out of Scope

stash 中混入的无关改动：portal.py、chat.ts、index.html、tauri.conf.json、skills/base.py、skills-lock.json、test_portal_routes.py、test_execution_modes.py
test_compression_strategy.py 的小幅调整（如签名变更导致的 fixture 更新，随 U3 自然处理）
引入 tiktoken 或 provider 特定 tokenizer（见 KTD1）
跨会话的持久化压缩状态

Deferred to Follow-Up Work

压缩比的运行时监控/告警（需接入 metrics 系统）
基于 provider 的 tokenizer 精确估算（KTD1 的升级路径）

Risks & Dependencies

风险 1: ~~align_prefix() 的动态行识别模式可能遗漏某些动态内容格式~~ （已取消 — U2 取消）
风险 2：compress() 签名移除 _compression_depth 参数可能破坏外部调用 → 缓解：_compression_depth 是内部参数（下划线前缀），无外部调用者
风险 3: ~~config_driven.py 压缩异常时不应阻断主流程~~ （已取消 — U4 取消）
依赖：~~U2 依赖 U1（同文件），U3 依赖 U2（compress 调用 align_prefix），U4 依赖 U1+U3~~ 修正：U2 已取消；U3 独立（无依赖）；~~U4 依赖 U1+U3~~ U4 已取消

Acceptance Examples

AE1: 中文长会话（100 条 CJK 消息）的 estimate_tokens 返回值 ≥ 旧实现的 4 倍（修正低估）
AE2: ~~含时间戳的 system prompt 经 align_prefix() 后，时间戳行在 [Dynamic Context] 段，静态行位置不变~~ （已取消 — U2 取消）
AE3: compress() 不再接受 _compression_depth 参数，对超长消息线性执行 compress → aggressive → truncate（修正：移除 align 步骤，U2 已取消）
AE4: react.py._should_compress() 对中文会话在合理时机触发（第三轮修正：// 4 fallback 是 live code 服务 HeadroomCompressor，改为用 estimate_text_tokens() 替换 // 4 而非移除）
AE5: ~~config_driven.py 直接模式下长 system prompt（token 超阈值）时自动调用压缩，压缩失败不阻断~~ （已取消 — U4 取消）

Open Questions

以下 findings 来自 ce-doc-review（coherence / feasibility / adversarial 三 persona 审查），于实现阶段处理。4 个 safe_auto fixes 已应用（U4 Goal 补 _build_sampled_output、R9 横切声明、R8 变量名修正、OQ14 ASCII 估算值修正）。

复核第二轮决策结果（自动用最佳判断处理）：

已解决（16 项）：OQ1/OQ4/OQ15（函数名修正为 _should_compress）、OQ2/OQ5/OQ16（触发条件改为 token 估算）、OQ3/OQ8（KTD1 理由修正）、OQ6/OQ7/OQ9/OQ12/OQ17/OQ18（U2 取消，premise 证伪）、OQ11/OQ19（U4 工具结果压缩辅助取消）、OQ14（safe_auto）、OQ20（KTD3 理由修正，维持 U3）

仍开放（4 项）：OQ10（aggressive 压缩质量退化）、OQ13（CJK 1:1 模型差异，FYI）

复核第三轮决策结果（自动用最佳判断处理）：

已应用（5 项）：

NEW-F13（safe_auto）：KTD1 修正 — litellm.token_counter 是纯本地函数，不调用网络 API

F-008（P0 反转）：// 4 fallback 是 live code（HeadroomCompressor 不实现 should_compress()）→ R7 改为"用 estimate_text_tokens() 替换 // 4"而非"移除死代码"

F-009（P0）：U4 全部取消 — compress() 守卫使单条 system 消息压缩无效；R8/KTD4/AE5/Scope Boundaries 同步取消

F-010（gated_auto）：U3 Approach step 4 明确 _compress_aggressive 接收 original messages（非已压缩的 compressed）

NEW-F10（gated_auto）：U1 scope 扩展 — _summarize() line 152 的内联 // 4 也改用 estimate_text_tokens()

仍开放（3 项）：OQ10（aggressive 压缩质量退化）、OQ13（CJK 1:1 模型差异，FYI）、OQ21（_truncate() * 4 一致性，P2 manual）

P0 — 阻断性（实现前必须解决）

OQ1 (feasibility, P0): U1/R7 引用的 react.py._needs_incremental_compression() 不存在。实际函数是 _should_compress()（line 1705），已委托给 compressor.should_compress()（line 1716-1718），// 4 fallback（line 1721）是死代码。

需决策: 修改 R7/U1 改为 _should_compress() 并移除死代码 fallback？还是从 scope 移除 R7（react.py 已委托给 compressor，U1 改 compressor 即可覆盖）？

OQ2 (feasibility, P0): U4/R8 引用的 config_driven.py._execute_direct() 不存在。实际方法是 _handle_direct()（line 1066）。且 rendered_messages 来自 PromptTemplate.render()，总是 ≤ 2 条（1 system + 1 user），len(rendered_messages) > 6 永远不可能为 true。

需决策: U4 的压缩触发条件需重新设计。是在 _handle_direct() 内对单条 system prompt 做基于 token 的压缩？还是压缩逻辑应放在其他位置（如 PromptTemplate.render() 之前）？

P1 — 重要（实现时需处理）

OQ3 (adversarial, P1): KTD1 前提虚假。声称"项目无 tiktoken 依赖"，但 litellm>=1.50 已是直接依赖（pyproject.toml line 30），提供 litellm.token_counter(model, messages) API 可精确估算 token。

需决策: 是否改用 litellm.token_counter 替代启发式？或维持启发式但修正 KTD1 理由（理由改为"避免 litellm 调用开销/兼容性"，而非"无依赖"）？

OQ4 (adversarial, P1): R7 引用不存在的函数（同 OQ1）。

OQ5 (adversarial, P1): KTD4 错误指标 — len(rendered_messages) > 6 永远不可能为 true（同 OQ2）。

OQ6 (feasibility, P1): KV cache benefit 只在 compress() 触发时实现。大多数请求不触发 compress（因为未超阈值），此时 align_prefix() 不会被调用 → prefix 对齐的实际收益被高估。

需决策: 是否将 align_prefix() 提前到每次请求都执行（独立于 compress）？还是接受"只在 compress 时对齐"的有限收益？

OQ7 (feasibility, P1): 动态内容 premise 未在代码库中验证。Plan 假设 system prompt 含时间戳/UUID/session_id，但需确认这些动态行是否真的存在于当前 system prompt 构造逻辑中。

需决策: 实现阶段先 grep 验证 system prompt 构造点，确认动态内容确实存在再实现 U2。

P2 — 值得修复（实现时酌情处理）

OQ8 (feasibility, P2): align_prefix() 幂等性。多次调用 align_prefix() 不应改变结果（避免 [Dynamic Context] 段重复追加）。需在测试场景中加幂等性测试。

OQ9 (adversarial, P2): align_prefix() 正则误判风险。用户消息中可能含时间戳格式文本（如"会议定于 2026-07-02"），被误移到末尾。需限定只对 system message 的特定行（如"当前时间:"前缀）匹配。

OQ10 (adversarial, P2): aggressive 压缩质量退化。_compress_aggressive() 只保留最后 1 条 + 摘要，可能丢失关键上下文。需评估是否保留更多 recent 消息（如最后 2-3 条）。

OQ11 (adversarial, P2): U4 范围蔓延。U4 同时包含 config_driven 压缩 + 4 个工具结果压缩辅助方法，scope 较大。考虑是否拆分为 U4a（config_driven 压缩）+ U4b（工具结果压缩辅助）。

OQ12 (adversarial, P2): 动态内容前提未验证（同 OQ7，adversarial 视角）。

P3 — FYI（知情即可）

OQ13 (adversarial, P3): CJK 1:1 估算的模型差异。不同模型（GPT-4/Claude/Gemini）对 CJK 的实际 token 比例略有差异（0.8-1.2 之间），1:1 是平均值。headroom_threshold=0.8 的缓冲可吸收此差异。

Coherence — 结构性（gated_auto/manual）

OQ14 (coherence, ~~gated_auto~~ safe_auto, 已解决): U1 Test scenarios 中 estimate_text_tokens("hello world") == 2 或 3 不确定 — 应明确 ASCII 11 字符 / 4 = 2.75，向下取整为 2。已修正为 == 2（11 字符 / 4 = 2.75，向下取整）。

OQ15 (coherence, manual): AE4 仍引用 _needs_incremental_compression()（同 OQ1），需随 OQ1 决策同步修正。

OQ16 (coherence, manual): AE5 仍写">6 条消息触发"，但 rendered_messages ≤ 2（同 OQ2），需随 OQ2 决策同步修正。

复核第二轮新增 — P0（阻断性）

OQ17 (feasibility+adversarial, P0, 新): U2 与现有 _build_system_message() 功能重复。react.py:1511-1561 已实现 stable/volatile 双块分离（Anthropic 用 cache_control: {"type": "ephemeral"}，非 Anthropic 用字符串拼接保持 stable 前缀），在 compress 调用前（line 670-674）已执行。U2 提议的 align_prefix() 与此高度重叠。

需决策: 取消 U2？还是将 U2 改写为 spike（调研 _build_system_message() 是否已覆盖所有动态内容场景，若覆盖则删除 U2）？

OQ18 (feasibility+adversarial, P0, 新): U2 核心前提被证伪。grep 当前时间/session_id/timestamp 模式在 system prompt 构造点（react.py、config_driven.py、prompts/template.py）零匹配 — PromptTemplate.render() 只做 ${var} 替换，不注入时间戳/UUID/session_id。U2 要对齐的动态内容在当前代码库中不存在。

需决策: 取消 U2？或保留 U2 作为防御性设计（未来可能添加动态内容）？

OQ19 (feasibility+adversarial, P0, 新): U4 工具结果压缩辅助与 HeadroomCompressor.compress_tool_result() 重叠。headroom_compressor.py:126-157 已实现成熟版本（content type 检测、SmartCrusher/CodeCompressor 路由、CCR hash 存储、min_length=500 阈值、异常 fallback）。U4 提议的 _compress_json_array/_compress_json_object/_compress_text/_build_sampled_output 会与之竞争或重复造轮子。且 compressor.py:237-239 的 compress_tool_result 已是 no-op。

需决策: 取消 U4 工具结果压缩辅助部分？还是改为委托 HeadroomCompressor？

复核第二轮新增 — P1（重要）

OQ20 (adversarial, P1, 新): KTD3 前提薄弱。compress() 的递归 _compression_depth 实际是 pseudo-linear（最多 2 层递归后强制 truncate，line 118-129），并非真正的复杂递归。"递归难以追踪"的理由不够充分，线性化的收益被高估。

需决策: 维持 KTD3（线性化仍有日志可读性收益）？还是删除 U3 的线性化部分，仅保留 _log_compression()？

复核第三轮新增 — P2（值得修复）

OQ21 (adversarial, P2, 新): _truncate() 的 * 4 字符假设与 U1 的 CJK 估算逻辑不一致。compressor.py:232-233 中 _truncate() 用 target_tokens * 4 计算字符数（假设 4 字符=1 token），对 CJK 文本会截断过多（CJK 1 字符 ≈ 1 token，按 * 4 计算的字符数只够 1/4 的 CJK token）。

当前状态: U1 scope 已扩展到 estimate_tokens() 方法 + _summarize() line 152，但 _truncate() 的 * 4 未纳入。
需决策: 实现阶段评估是否将 _truncate() 也改用 CJK-aware 估算（基于 estimate_text_tokens 反推字符数），或保留 * 4 作为 truncate 路径的保守下界（截断过多总比超出 context window 安全）。第三轮建议：保留 * 4 作为 manual 跟进，因为它在 truncate 兜底路径上，保守截断是安全的；纳入 U1 scope 会让 U1 边界扩张到 truncate 路径，与"压缩触发时机"的核心目标偏离。

28 KiB Raw Blame History Unescape Escape

Context Compressor CJK Token 估算 + Prefix 对齐 + 压缩简化

Summary

Problem Frame

Requirements

Key Technical Decisions

KTD1: CJK 估算用启发式，不引入 tiktoken

KTD2: Prefix 对齐策略 — 动态行移到末尾 （已取消 — OQ17/OQ18）

KTD3: 压缩逻辑改线性，移除递归

KTD4: config_driven.py 压缩触发用消息条数 （已取消 — 随 U4 取消）

Implementation Units

U1. CJK-aware token 估算

U2. Prefix 对齐 （已取消）

U3. 压缩逻辑简化 + 结构化日志

U4. config_driven.py LLM 调用前压缩 + 工具结果压缩辅助 （已取消 — 第三轮 F-009）

Scope Boundaries

In Scope

Out of Scope

Deferred to Follow-Up Work

Risks & Dependencies

Acceptance Examples

Open Questions

P0 — 阻断性（实现前必须解决）

P1 — 重要（实现时需处理）

P2 — 值得修复（实现时酌情处理）

P3 — FYI（知情即可）

Coherence — 结构性（gated_auto/manual）

复核第二轮新增 — P0（阻断性）

复核第二轮新增 — P1（重要）

复核第三轮新增 — P2（值得修复）

28 KiB

Raw Blame History

KTD2: Prefix 对齐策略 — 动态行移到末尾（已取消 — OQ17/OQ18）

KTD4: config_driven.py 压缩触发用消息条数（已取消 — 随 U4 取消）

U2. Prefix 对齐（已取消）

U4. config_driven.py LLM 调用前压缩 + 工具结果压缩辅助（已取消 — 第三轮 F-009）