fischer-agentkit/2026-06-29-002-feat-agent-wave1-quick-wins-plan.md at 36b0296730141048a1637a4d9b11e68dd7d50a2c

19 KiB

Raw Blame History

title	type	date	origin
feat: Agent Wave 1 快速补强 (verify 回灌/prompt cache/schema 校验/delta_flush)	feat	2026-06-29	docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md

Summary

落地 brainstorm Wave 1 的 4 项自包含快速补强:G1 verify 失败回灌 ReAct、G2 prompt cache 三层结构跨 provider、G3 工具调用 schema 校验、G8 delta_flush_interval 调速。四项均作用于 ReAct 引擎层,同 plan 内 G2→G8 顺序执行(共享 execute_stream 改动),每项附最小自检测试。

Problem Frame

agentkit 对比 Qoder/Codex/Hermes/Trae 后发现 9 个真新缺口(已在生产/测试观察到痛点),brainstorm 分 3 波交付。本 plan 实现 Wave 1——自包含、低风险的 4 项快速补强,覆盖反馈稳定性(verify 不回灌、工具无 schema 校验)与响应效率(prompt cache 命中率低、token chunk 无节流)。

验证器已核对仓库现状(见 origin Sources & Research):verification_loop.py:111-145 verify 失败仅调 fix_callback 不回灌 ReAct;react.py:1042-1059 记忆注入拼到 system prompt 末尾破坏 cache 前缀;react.py:1118-1134 每个 token chunk 都 yield 无节流;tools/base.py:50-77 safe_execute 无 schema 校验直接 await execute()。

Requirements

G1 Verify 失败回灌 ReAct (origin R1-R3)

R1. verify 失败时,系统自动把 errors 作为新 user 消息注入 conversation,继续 ReAct 循环,而非直接退出。
R2. 回灌后若二次 verify 仍失败,系统中断执行并返回错误给用户,附 verify log。
R3. 回灌最大重试次数可配置(默认 1 次),受 max_steps 上限约束。

G2 Prompt Cache 三层结构 (origin R4-R7)

R4. system prompt 重构为双块结构:stable(技能配置/系统指令)+volatile(记忆检索+时间戳)。原 context(会话上下文)层由 conversation messages 承载,不进 system prompt。
R5. 记忆检索注入从 system prompt 末尾移到 volatile 层,stable 层保持不变。
R6. 跨 provider 统一 cache 策略:Anthropic 显式插入 cache_control 断点;OpenAI 等依赖自动前缀缓存。
R7. 多轮对话输入 token 成本降低(目标 ~50%)。

G3 工具调用 Schema 校验 (origin R8-R10)

R8. safe_execute 调用 execute() 前,基于 tool.input_schema 校验参数(类型/必填)。
R9. 校验失败时返回类型化错误码(tool_call_invalid/schema_mismatch),不执行工具。
R10. 错误以 tool 角色消息回灌 conversation,给 LLM 自我修正机会。

G8 delta_flush_interval 调速 (origin R11-R12)

R11. execute_stream 的 token chunk yield 加可配置节流(默认 flush_interval_ms)。
R12. 节流配置化,允许客户端调高降低渲染开销。

Cross-cutting (origin R26-R27)

R13. 所有优化项配置化(agentkit.yaml 新增对应配置节)。
R14. 每个优化项附最小自检测试(ponytail 规则)。

Key Technical Decisions

KTD1:G3 schema 校验放在 Tool.safe_execute(base.py),而非 _execute_tool(react.py)

校验在工具基类层,所有调用方(ReActEngine、ExpertTeam、StandaloneAgent)统一受益。以 tool.input_schema(JSON Schema dict)为契约源,input_schema=None 时跳过校验保持向后兼容。用 jsonschema 库(已是 Python 生态标准,无新依赖)。

KTD2:G2 双块结构用 content blocks + cache_control 标记,Anthropic provider 需改 _convert_messages

system message 的 content 从字符串改为 content blocks 列表([{"type":"text","text":stable,"cache_control":{"type":"ephemeral"}},{"type":"text","text":volatile}])。注意:src/agentkit/llm/providers/anthropic.py 是 httpx 直连实现(非 LiteLLM),其 _convert_messages(:102-197)假设 system content 为字符串(:116),需修改以支持 list-type system content 并透传 cache_control blocks。OpenAI 等 provider 的 chat completions API 不支持 list-type system content,_build_system_message 需按 provider 能力检测:Anthropic 返回 blocks,其余返回字符串拼接(stable+volatile),依赖 stable 前缀命中自动前缀缓存。不改 gateway 方法签名。

KTD3:G1 verify 回灌包进 ReAct 主循环,而非外层 wrapper

verify 当前在循环外 final answer 后运行(react.py:887 execute / :1603 execute_stream)。回灌改为:检测到 final answer(无 tool_calls)→ 运行 verify → 失败则把 errors 作为 user 消息 append 到 conversation → continue 主循环(LLM 自纠正)→ 二次 final answer 再 verify → 仍失败则 break 带 verify log。保留现有 VerificationLoop 类与 verify_and_retry 方法不动(向后兼容),回灌逻辑在 ReActEngine 内。

KTD4:G8 delta_flush 用 time.monotonic 节流,非计数器

在 execute_stream chunk 循环内累积 chunks,按 flush_interval_ms 间隔批量 yield。flush_interval_ms=0 时退化为逐 chunk yield(向后兼容)。流结束 mid-interval 时最终 flush。用 time.monotonic()(不受系统时钟跳变影响)。

KTD5:G2→G8 同 plan 内顺序执行,共享 execute_stream 改动

G2 改 system prompt 构造(循环前),G8 改 chunk yield 逻辑(循环内),两者不冲突但都触 execute_stream。G2 先落地确保 stable 前缀结构稳定,G8 再加节流避免在未稳定结构上叠加。

KTD6:ServerConfig 到 ReActEngine 的接线用独立构造参数(带默认值)

ReActEngine.__init__(react.py:154-198)不接受 ServerConfig 对象,采用独立构造参数:prompt_cache_enable: bool = True、flush_interval_ms: int = 0、max_reinjections: int = 1(默认值保当前行为,向后兼容)。ServerConfig 在 agent 工厂/handler 层(chat/handler.py 等 ReActEngine 构造点)读取并传入。实现时需列出所有构造 ReActEngine 的调用点并更新。

Implementation Units

U1. G3 工具调用 Schema 校验

Goal: 在 Tool.safe_execute 调用 execute() 前校验参数,失败返回类型化错误。

Requirements: R8, R9, R10, R14

Dependencies: 无(独立,奠基性——定义错误回灌模式)

Files:

修改: src/agentkit/tools/base.py(safe_execute 加校验 + ToolValidationError)
修改: src/agentkit/core/react.py(_execute_tool :1897-1916 捕获 ToolValidationError 并 append tool 角色消息)
测试: tests/unit/test_tool_schema_validation.py

Approach:

新增 ToolValidationError(Exception),带 error_code(tool_call_invalid/schema_mismatch)与 details。
safe_execute 在 before_execute 后、execute 前:若 self.input_schema 非 None,用 jsonschema.validate(kwargs, self.input_schema);校验失败抛 ToolValidationError。
input_schema=None → 跳过(向后兼容,旧工具无 schema)。
_execute_tool(react.py:1897-1916)在现有 except Exception 之前加 except ToolValidationError as e: 优先捕获,返回 {"error": str(e), "error_code": e.error_code, "details": e.details}(保留类型化错误码,不被通用 except 平坦化为字符串)。现有调用方 _build_tool_result_message 把返回 dict 转为 tool 角色消息 append 到 conversation,给 LLM 自纠正机会。

Patterns to follow: VerificationResult 的类型化错误模式(verification_loop.py:18-24);jsonschema 标准用法。

Test scenarios:

Covers R8. Happy path: tool 有 input_schema={"type":"object","properties":{"count":{"type":"integer"}},"required":["count"]},传 count=5 → 校验通过,execute 正常执行。
Covers R9. Edge: input_schema=None → 跳过校验,execute 正常(向后兼容)。
Covers R9. Error: 传 count="abc"(类型错)→ 抛 ToolValidationError(error_code="tool_call_invalid"),execute 不调用。
Covers R9. Error: 缺 count(必填)→ 抛 ToolValidationError(error_code="schema_mismatch")。
Covers R10. Integration: _execute_tool 捕获 ToolValidationError → conversation append tool 角色消息 → LLM 下一轮看到错误并修正参数 → 重试成功。

Verification: python3 -m pytest tests/unit/test_tool_schema_validation.py -x -q 通过;现有工具测试不回归。

U2. G2 Prompt Cache 双块结构

Goal: system prompt 重构为 stable/volatile 双块结构,记忆注入移到 volatile,加 cache_control 断点。

Requirements: R4, R5, R6, R7, R13, R14

Dependencies: 无(独立,与 U1 不触同代码区)

Files:

修改: src/agentkit/core/react.py(execute_stream :1042-1059 记忆注入 + system message 构造;execute 同路径若有,可选 — 见 Scope Boundaries)
修改: src/agentkit/llm/providers/anthropic.py(_convert_messages :102-197 支持 list-type system content,透传 cache_control blocks)
配置: src/agentkit/config.py 或 ServerConfig(prompt_cache.enable: bool)
测试: tests/unit/test_prompt_cache_layers.py

Approach:

新增 _build_system_message(base_prompt, memory_context, enable_cache, provider) 工具方法:Anthropic provider 返回 content blocks 列表,stable 块在前(带 cache_control: {"type":"ephemeral"}),volatile 块(记忆+时间戳)在后;非 Anthropic provider 返回字符串拼接(stable+volatile),依赖 stable 前缀命中自动前缀缓存。
execute_stream :1042-1059:记忆注入从 system_prompt += "## 参考信息" 改为收集 memory_context,传给 _build_system_message。
conversation 的 system 消息 content:Anthropic 用 blocks 列表,其余用字符串;gateway chat_stream 经 **kwargs 透传。
anthropic.py 的 _convert_messages(:102-197)需修改::116 从 system_prompt = content 改为支持 list-type content 直接透传(payload ["system"] 接受字符串或 content blocks)。
配置 prompt_cache.enable: bool(默认 True)。断点数硬编码为 1(stable 层),不暴露配置(YAGNI — 2 块结构下 >1 无语义)。
enable_cache=False 或 provider 非 Anthropic → 退化为字符串拼接(向后兼容)。

Patterns to follow: LiteLLM/Anthropic cache_control content block 规范;现有 memory_retriever.get_context_string 调用不变。

Test scenarios:

Covers R4, R5. Happy path: 多轮对话,stable 层(技能配置)跨轮不变,volatile 层(记忆)随 query 变 → system message content blocks 结构正确,stable 在前 volatile 在后。
Covers R6. Integration: Anthropic provider 收到带 cache_control 的 content blocks → _convert_messages 透传 → cache 命中 stable 前缀。
Covers R6. Edge: OpenAI provider(provider != anthropic)→ _build_system_message 返回字符串拼接,不报错;stable 前缀命中自动前缀缓存。
Covers R5. Edge: memory_retriever 返回空 → 无 volatile 块,system message 仅 stable。
Covers R13. Config: prompt_cache.enable=False → 退化为字符串拼接,行为同改动前。

Verification: python3 -m pytest tests/unit/test_prompt_cache_layers.py -x -q 通过;多轮对话 system message 结构符合预期。

U3. G8 delta_flush_interval 调速

Goal: execute_stream token chunk yield 加时间节流。

Requirements: R11, R12, R13, R14

Dependencies: U2(共享 execute_stream,G2 先改 system prompt 结构,G8 后改 yield 逻辑)

Files:

修改: src/agentkit/core/react.py(execute_stream chunk 循环 :1118-1134;execute 同路径若有)
配置: ServerConfig(streaming.flush_interval_ms)
测试: tests/unit/test_delta_flush.py

Approach:

chunk 循环内:累积 stream_content_chunks 同时累积 _flush_buffer;用 time.monotonic() 记 _last_flush_ts。
当 now - _last_flush_ts >= flush_interval_ms/1000 时:yield 合并后的 buffer,清空,更新 ts。
flush_interval_ms=0 → 逐 chunk yield(向后兼容,当前行为)。
流结束(for 循环退出)→ 最终 flush 剩余 buffer。

Patterns to follow: time.monotonic() 用法(已在 :1080 _stream_start 使用);现有 ReActEvent(event_type="token") 结构不变。

Test scenarios:

Covers R11. Happy path: flush_interval_ms=50,模拟连续 chunks → 按 50ms 间隔批量 yield,合并 content。
Covers R12. Config: flush_interval_ms=0 → 逐 chunk yield(向后兼容)。
Edge: 流结束 mid-interval → 最终 flush 剩余 buffer,不丢 content。
Edge: 单个 chunk 后流结束 → 立即 flush。
Covers R14. Self-check: 断言 yield 的合并 content 等于原始 chunks 拼接(不丢字符)。

Verification: python3 -m pytest tests/unit/test_delta_flush.py -x -q 通过;token 流无字符丢失。

U4. G1 Verify 失败回灌 ReAct

Goal: verify 失败时把 errors 注入 conversation 继续 ReAct,二次失败中断带 log。

Requirements: R1, R2, R3, R13, R14

Dependencies: U1(复用错误回灌模式;U1 的 ToolValidationError 回灌与 G1 verify 回灌同模式)

Files:

修改: src/agentkit/core/react.py(execute :886-907 verify 块;execute_stream :1601-1629 verify 块)
配置: ServerConfig(verification.max_reinjections: int = 1)
测试: tests/unit/test_verify_reinjection.py

Approach:

把 verify 从"循环后一次性运行"改为"final answer 检测点 + 回灌重试"。
主循环内检测到 final answer(无 tool_calls)时:if self._verification_enabled → 运行 verify。
verify 通过 → break,正常结束。
verify 失败且 reinjections < max_reinjections:append {"role":"user","content":f"验证失败,错误如下:\n{vresult.errors}"} 到 conversation,continue 主循环(LLM 见 errors 自纠正)。
verify 失败且 reinjections >= max:记录 verify log 到 trajectory,break 返回失败结果。
保留现有 VerificationLoop 类与 verify_and_retry 不动(向后兼容,外部调用方仍可用)。

Patterns to follow: 现有 verify 块的 trajectory/event 记录模式;U1 的 ToolValidationError 回灌模式。

Execution note: 先加 characterization 测试覆盖现有 verify 行为(失败仅记录),再改实现确保不回归。

Test scenarios:

Covers R1. Happy path: verify 首次失败 → errors 注入 conversation → LLM 自纠正 → 二次 verify 通过 → 任务完成。
Covers R2. Error: verify 二次失败 → 中断,返回错误附 verify log(测试输出 + errors 列表)。
Covers R3. Config: max_reinjections=0 → 等价于不回灌(当前行为),verify 失败直接退出。
Covers R3. Edge: 回灌期间达到 max_steps → 中断(不无限循环)。
Covers R1. Integration: 回灌的 user 消息出现在 conversation,LLM 下一轮 input 含 errors 文本。
Covers R14. Self-check: max_reinjections 默认值为 1。

Verification: python3 -m pytest tests/unit/test_verify_reinjection.py -x -q 通过;pytest tests/unit/ -x -q 全量不回归。

Scope Boundaries

Deferred for later

Wave 2(G4 辅助 LLM 分流 / G7 三级降级链 / G9 原子化 rollback)——见 origin Wave 2 section,单独 plan。
Wave 3(G5 函数级代码分片 / G6 SOLO 状态机)——见 origin Wave 3 section,单独 plan。
G7 Emergency 层规则模板、G5 tree-sitter 集成方式——origin Deferred to Planning。

Deferred to Follow-Up Work

execute()(非流式)的 G2/G8 改动:本 plan 优先 execute_stream(WebSocket 主路径),execute() 同步改动可顺带做但非必须,若拆 PR 则归 follow-up。
prompt cache 命中率指标化(R7 的 ~50% 目标):本 plan 落结构,埋点量化归 follow-up。

Outside this product's identity

重写编排逻辑/拓扑排序/Board 辩论——继承自 2026-06-24-004。
节点级 checkpoint——继承自 2026-06-24-004 KTD3。
全盘迁移 LangGraph——继承自 2026-06-24-004。

Risks & Dependencies

风险:G2 content blocks 改动破坏现有 message 序列化。 gateway/messages 假设 content: str,改为 blocks 列表需核对所有序列化路径(WebSocket、日志、trace)。缓解:仅 system 消息用 blocks,user/assistant 保持 str;enable_cache=False 退化路径保底。
风险:G1 回灌增加 token 消耗。 二次 verify 循环会多跑一轮 LLM。缓解:max_reinjections 默认 1,受 max_steps 上限约束;KTD3 设计为循环内 continue 而非递归,无栈溢出风险。
依赖:U3 依赖 U2 完成。 两者共享 execute_stream,G2 先改 system prompt 结构稳定后再加 G8 节流(KTD5)。
依赖:G2 需修改 anthropic.py 的 _convert_messages 支持 list-type system content。 anthropic.py 是 httpx 直连实现(非 LiteLLM),需手动改 :116 从 system_prompt = content 改为支持 content blocks 透传。非 Anthropic provider 走字符串拼接退化路径,不报错。
依赖:G3 的 jsonschema 库。 jsonschema>=4.0 已在 pyproject.toml 核心依赖中(line 22),无需新增依赖。

Acceptance Examples

AE1(Covers R1, R2, R3):ReAct 任务 final answer 后 verify 失败 → errors 注入 conversation 继续 ReAct → LLM 修正后二次 verify 通过 → 完成。若二次失败 → 中断返回错误附 verify log。
AE2(Covers R4, R5, R6, R7):50 轮长对话 → stable 层不变,volatile 随 query 变 → Anthropic cache 命中 stable 前缀,OpenAI 命中自动前缀缓存 → 输入 token 降低。
AE3(Covers R8, R9, R10):LLM 传错参数类型 → schema 校验失败 → tool_call_invalid → 错误回灌 conversation → LLM 修正后重试成功。
AE4(Covers R11, R12):弱网客户端 → token chunk 按 50ms 批量 yield → 前端渲染无卡顿,无字符丢失。

Sources & Research

Origin: docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md — Wave 1 需求文档(KTD1-KTD7、R1-R27)
上游 plan: docs/plans/2026-06-24-004-feat-long-horizon-reliability-optimization-plan.md — U1-U7 长程可靠性护栏(本 plan 增补)
Learnings: docs/solutions/logic-errors/long-horizon-reliability-code-review-fixes.md — 14 个 finding 教训(新字段默认值保契约、跨模块契约显式化、清理方法接入生命周期)
代码 grounding(验证器核对):
- src/agentkit/core/verification_loop.py:111-145 — verify_and_retry 接 fix_callback,不回灌 ReAct
- src/agentkit/core/react.py:1042-1059 — 记忆注入拼到 system prompt 末尾
- src/agentkit/core/react.py:1118-1134 — chunk 循环逐 token yield 无节流
- src/agentkit/core/react.py:886-907, 1601-1629 — verify 在循环后运行,失败仅记录 trajectory
- src/agentkit/core/react.py:1897-1916 — _execute_tool 无 schema 校验
- src/agentkit/tools/base.py:21,28,50-77 — Tool.input_schema(JSON Schema dict,可选),safe_execute 无校验
- src/agentkit/llm/gateway.py:268-281 — chat_stream(**kwargs) 可透传 cache_control
- src/agentkit/llm/providers/anthropic.py:316 — Anthropic provider chat_stream
外部研究(brainstorm 阶段): Qoder(Spec→Verify 闭环、delta_flush_interval)、Hermes(cache_control system_and_3、validate_function_call_schema)、Codex(prompt caching 前缀匹配)

19 KiB Raw Blame History

Summary

Problem Frame

Requirements

G1 Verify 失败回灌 ReAct (origin R1-R3)

G2 Prompt Cache 三层结构 (origin R4-R7)

G3 工具调用 Schema 校验 (origin R8-R10)

G8 delta_flush_interval 调速 (origin R11-R12)

Cross-cutting (origin R26-R27)

Key Technical Decisions

Implementation Units

U1. G3 工具调用 Schema 校验

U2. G2 Prompt Cache 双块结构

U3. G8 delta_flush_interval 调速

U4. G1 Verify 失败回灌 ReAct

Scope Boundaries

Deferred for later

Deferred to Follow-Up Work

Outside this product's identity

Risks & Dependencies

Acceptance Examples

Sources & Research

19 KiB

Raw Blame History