======================================================================
Fischer AgentKit 综合能力回测报告
======================================================================
生成时间: 2026-06-17T05:29:48.993554+00:00
总体评分: 100.0%
用例总数: 50  通过: 50  失败: 0

----------------------------------------------------------------------
各维度得分
----------------------------------------------------------------------
  ✓ 预处理准确度: 100.0% (17/17)
  ✓ 技能召回率: 100.0% (8/8)
  ✓ 过拟合检测: 100.0% (5/5)
  ✓ 执行效率: 100.0% (5/5)
  ✓ 工具搜索准确度: 100.0% (8/8)
  ✓ 事件模型完整性: 100.0% (3/3)
  ✓ Spec 管理功能: 100.0% (2/2)
  ✓ 验证循环: 100.0% (2/2)

----------------------------------------------------------------------
详细用例结果
----------------------------------------------------------------------

[预处理准确度]
  ✓ greeting_cn
  ✓ greeting_en
  ✓ greeting_hi
  ✓ chitchat_thanks
  ✓ chitchat_ok
  ✓ identity_who
  ✓ identity_name
  ✓ tool_ip
  ✓ tool_search
  ✓ tool_shell
  ✓ tool_file
  ✓ tool_monitor
  ✓ complex_analysis
  ✓ complex_code
  ✓ complex_multi
  ✓ skill_prefix_react
  ✓ skill_prefix_coder

[技能召回率]
  ✓ recall_valid_react
  ✓ recall_valid_coder
  ✓ recall_invalid_skill
  ✓ recall_no_prefix_react
  ✓ recall_no_prefix_greeting
  ✓ recall_no_prefix_complex
  ✓ recall_skill_only_prefix
  ✓ recall_skill_with_long_content

[过拟合检测]
  ✓ overfit_ip_check
  ✓ overfit_search
  ✓ overfit_greeting
  ✓ overfit_file_read
  ✓ overfit_identity

[执行效率]
  ✓ efficiency_greeting
  ✓ efficiency_chitchat
  ✓ efficiency_identity
  ✓ efficiency_react_tool
  ✓ efficiency_react_complex

[工具搜索准确度]
  ✓ tool_search_read
  ✓ tool_search_write
  ✓ tool_search_web
  ✓ tool_search_shell
  ✓ tool_search_tests
  ✓ tool_search_file_multiple
  ✓ tool_search_no_match
  ✓ tool_search_empty_query

[事件模型完整性]
  ✓ sq_submit_and_drain
  ✓ eq_emit_and_subscribe
  ✓ event_type_classification

[Spec 管理功能]
  ✓ spec_create_and_get
  ✓ spec_confirm

[验证循环]
  ✓ verify_success
  ✓ verify_failure

----------------------------------------------------------------------
改进建议
----------------------------------------------------------------------
  • 所有维度均达到 100%，架构状态良好

======================================================================