feat(agent): Wave 1 quick wins (G1/G2/G3/G8) + review fixes #4

Merged
fischer merged 6 commits from feat/agent-wave1-quick-wins into main 2026-06-29 22:08:56 +08:00
Owner

Summary

Wave 1 of advanced-agent gap optimization - 4 self-contained quick fixes addressing verify reinjection, prompt cache control, tool-call schema validation, and stream flush throttling.

Units

Unit Gap File Summary
U1 G3 tools/base.py, core/react.py Tool.safe_execute runs jsonschema.validate before execute(); raises ToolValidationError(error_code=tool_call_invalid
U2 G2 llm/anthropic.py, llm/gateway.py cache_control blocks on system message + last user/tool pair. Other providers keep string-prefix concatenation (auto prefix cache).
U3 G8 core/react.py flush_interval_ms (default 0 = passthrough) throttles token events via time.monotonic() with final flush in finally.
U4 G1 core/react.py Verify block moved inside ReAct loop at final-answer detection. Verify fail -> inject errors as user message -> continue (LLM self-corrects). Second fail or max_steps hit -> record trajectory + trace_outcome=verify_failed + break.

Config

New optional YAML keys (all default to safe values, fully backward compatible):

prompt_cache:
  enable: true
streaming:
  flush_interval_ms: 0
verification:
  max_reinjections: 1

Review Fixes (commit d7ca6e8)

  • W1: ServerConfig.from_dict now actually wires prompt_cache / streaming / verification sections from YAML (previously these fields existed in the constructor but were never read from YAML).
  • W3: Tool._validate_input filters _-prefixed kwargs (e.g. _skip_dangerous_check) before jsonschema.validate, preventing additionalProperties:false schemas from rejecting internal control parameters. Test added.
  • N3: ReActResult.status docstring now lists empty_fallback and verify_failed.

Test Results

67 passed in 2.12s:

  • test_tool_schema_validation.py (9 tests, +1 W3)
  • test_verify_reinjection.py (10 tests)
  • test_prompt_cache_layers.py (10 tests)
  • test_react_engine.py (38 tests)

ruff check: All checks passed!

Pre-existing failure test_react_compression.py::test_execute_stream_with_compressor confirmed unrelated via git stash baseline run.

Commits

  • c66a777 feat(U1): G3 schema validation
  • c4aaef0 feat(U2): G2 prompt cache double-block
  • 0f3f0a7 feat(U3): G8 delta_flush_interval throttling
  • cd211c6 feat(U4): G1 verify failure reinjection
  • d7ca6e8 fix(review): W1 ServerConfig from_dict wiring, W3 internal kwargs filter, N3 status docstring

Out of Scope (Future Waves)

  • Wave 2: G4 (parallel tool planner), G7 (context compression strategy), G9 (memory retrieval ranking)
  • Wave 3: G5 (planner separation), G6 (reflexion integration)

Risk Assessment

All changes are opt-in (defaults preserve existing behavior). U1 (schema validation) is the only always-on change, but only affects tools that declare input_schema - tools without schemas skip validation. Schema errors return structured dicts rather than raising, so the ReAct loop is not broken.

## Summary Wave 1 of advanced-agent gap optimization - 4 self-contained quick fixes addressing verify reinjection, prompt cache control, tool-call schema validation, and stream flush throttling. ## Units | Unit | Gap | File | Summary | |------|-----|------|---------| | U1 | G3 | tools/base.py, core/react.py | Tool.safe_execute runs jsonschema.validate before execute(); raises ToolValidationError(error_code=tool_call_invalid|schema_mismatch). _execute_tool catches and returns structured dict preserving error_code for LLM self-correction. | | U2 | G2 | llm/anthropic.py, llm/gateway.py | cache_control blocks on system message + last user/tool pair. Other providers keep string-prefix concatenation (auto prefix cache). | | U3 | G8 | core/react.py | flush_interval_ms (default 0 = passthrough) throttles token events via time.monotonic() with final flush in finally. | | U4 | G1 | core/react.py | Verify block moved inside ReAct loop at final-answer detection. Verify fail -> inject errors as user message -> continue (LLM self-corrects). Second fail or max_steps hit -> record trajectory + trace_outcome=verify_failed + break. | ## Config New optional YAML keys (all default to safe values, fully backward compatible): ```yaml prompt_cache: enable: true streaming: flush_interval_ms: 0 verification: max_reinjections: 1 ``` ## Review Fixes (commit d7ca6e8) - W1: ServerConfig.from_dict now actually wires prompt_cache / streaming / verification sections from YAML (previously these fields existed in the constructor but were never read from YAML). - W3: Tool._validate_input filters _-prefixed kwargs (e.g. _skip_dangerous_check) before jsonschema.validate, preventing additionalProperties:false schemas from rejecting internal control parameters. Test added. - N3: ReActResult.status docstring now lists empty_fallback and verify_failed. ## Test Results 67 passed in 2.12s: - test_tool_schema_validation.py (9 tests, +1 W3) - test_verify_reinjection.py (10 tests) - test_prompt_cache_layers.py (10 tests) - test_react_engine.py (38 tests) ruff check: All checks passed! Pre-existing failure test_react_compression.py::test_execute_stream_with_compressor confirmed unrelated via git stash baseline run. ## Commits - c66a777 feat(U1): G3 schema validation - c4aaef0 feat(U2): G2 prompt cache double-block - 0f3f0a7 feat(U3): G8 delta_flush_interval throttling - cd211c6 feat(U4): G1 verify failure reinjection - d7ca6e8 fix(review): W1 ServerConfig from_dict wiring, W3 internal kwargs filter, N3 status docstring ## Out of Scope (Future Waves) - Wave 2: G4 (parallel tool planner), G7 (context compression strategy), G9 (memory retrieval ranking) - Wave 3: G5 (planner separation), G6 (reflexion integration) ## Risk Assessment All changes are opt-in (defaults preserve existing behavior). U1 (schema validation) is the only always-on change, but only affects tools that declare input_schema - tools without schemas skip validation. Schema errors return structured dicts rather than raising, so the ReAct loop is not broken.
fischer added 6 commits 2026-06-29 22:00:42 +08:00
c66a7773b5 feat(U1): G3 工具调用 schema 校验
- base.py 新增 ToolValidationError(error_code/details)与 _validate_input
- safe_execute 在 execute 前用 jsonschema.validate 校验 kwargs
- input_schema=None 跳过校验保持向后兼容
- _execute_tool 优先捕获 ToolValidationError 保留 error_code
- function_tool._infer_schema 修复 VAR_KEYWORD/VAR_POSITIONAL 误入 schema
- test_tool_schema_validation.py 覆盖 R8-R10
c4aaef05aa feat(U2): G2 prompt cache 双块结构
- ReActEngine 新增 _build_system_message(stable+volatile) 双块构造
- Anthropic provider 返回 content blocks,stable 块带 cache_control
- 非 Anthropic provider 返回字符串拼接,依赖 stable 前缀命中自动前缀缓存
- execute_stream/execute 记忆注入从 system_prompt 末尾移到 volatile 层
- LLMGateway.get_provider_name_for_model 暴露 provider 检测能力
- anthropic.py _convert_messages 支持 list-type system content 透传
- ServerConfig.prompt_cache 配置项(默认 enable=True)
- ReActEngine.prompt_cache_enable 构造参数(默认 True 保当前行为)
- test_prompt_cache_layers.py 覆盖 R4-R7/R13
0f3f0a7550 feat(U3): G8 delta_flush_interval 调速
- ReActEngine 新增 flush_interval_ms 构造参数(默认 0 = 逐 chunk yield 向后兼容)
- execute_stream chunk 循环用 time.monotonic 节流,累积 _flush_buffer 批量 yield
- flush_interval_ms=0 条件短路为 True 逐 chunk yield 保当前行为
- 流结束 mid-interval 最终 flush 剩余 buffer 不丢字符
- ServerConfig.streaming 配置项(flush_interval_ms)
- test_delta_flush.py 覆盖 R11/R12/R14
cd211c6cd9 feat(U4): G1 verify 失败回灌 ReAct
- ReActEngine 新增 max_reinjections 构造参数(默认 1,=0 等价原行为)
- execute()/execute_stream() verify 块从循环后移到循环内 final-answer 检测点:
  - verify 通过 → 正常 break
  - verify 失败 + reinjections < max + step < max_steps → errors 作为 user 消息回灌 conversation, continue 让 LLM 自纠正
  - verify 失败 + 达到 max_reinjections 或 max_steps → 记录 verify log 到 trajectory, trace_outcome="verify_failed", break
- execute_stream 的 final_answer 事件在 verify 通过后才 yield,避免客户端过早收到完成信号
- ReActResult.status 现在传递 trace_outcome(原默认 "success")
- ServerConfig.verification 配置项(max_reinjections)
- test_verify_reinjection.py 10 测试:characterization(max=0)+ 新行为(R1/R2/R3/R14)
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
d7ca6e8065
fix(review): W1 ServerConfig from_dict wiring, W3 internal kwargs filter, N3 status docstring
Code review fixes for Wave 1:
- W1: ServerConfig.from_dict now wires prompt_cache/streaming/verification sections
  from YAML to constructor (previously these params existed but were never read)
- W3: Tool._validate_input filters _-prefixed kwargs (e.g. _skip_dangerous_check)
  before jsonschema.validate, preventing additionalProperties:false schemas from
  rejecting internal control parameters
- N3: ReActResult.status docstring now lists "empty_fallback" and "verify_failed"

Added test test_internal_kwargs_underscore_prefixed_skipped_by_validation for W3.
fischer merged commit 78ed93fc81 into main 2026-06-29 22:08:56 +08:00
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: fischer/fischer-agentkit#4
No description provided.