95 lines
2.5 KiB
Plaintext
95 lines
2.5 KiB
Plaintext
======================================================================
|
|
Fischer AgentKit 综合能力回测报告
|
|
======================================================================
|
|
生成时间: 2026-06-17T05:29:48.993554+00:00
|
|
总体评分: 100.0%
|
|
用例总数: 50 通过: 50 失败: 0
|
|
|
|
----------------------------------------------------------------------
|
|
各维度得分
|
|
----------------------------------------------------------------------
|
|
✓ 预处理准确度: 100.0% (17/17)
|
|
✓ 技能召回率: 100.0% (8/8)
|
|
✓ 过拟合检测: 100.0% (5/5)
|
|
✓ 执行效率: 100.0% (5/5)
|
|
✓ 工具搜索准确度: 100.0% (8/8)
|
|
✓ 事件模型完整性: 100.0% (3/3)
|
|
✓ Spec 管理功能: 100.0% (2/2)
|
|
✓ 验证循环: 100.0% (2/2)
|
|
|
|
----------------------------------------------------------------------
|
|
详细用例结果
|
|
----------------------------------------------------------------------
|
|
|
|
[预处理准确度]
|
|
✓ greeting_cn
|
|
✓ greeting_en
|
|
✓ greeting_hi
|
|
✓ chitchat_thanks
|
|
✓ chitchat_ok
|
|
✓ identity_who
|
|
✓ identity_name
|
|
✓ tool_ip
|
|
✓ tool_search
|
|
✓ tool_shell
|
|
✓ tool_file
|
|
✓ tool_monitor
|
|
✓ complex_analysis
|
|
✓ complex_code
|
|
✓ complex_multi
|
|
✓ skill_prefix_react
|
|
✓ skill_prefix_coder
|
|
|
|
[技能召回率]
|
|
✓ recall_valid_react
|
|
✓ recall_valid_coder
|
|
✓ recall_invalid_skill
|
|
✓ recall_no_prefix_react
|
|
✓ recall_no_prefix_greeting
|
|
✓ recall_no_prefix_complex
|
|
✓ recall_skill_only_prefix
|
|
✓ recall_skill_with_long_content
|
|
|
|
[过拟合检测]
|
|
✓ overfit_ip_check
|
|
✓ overfit_search
|
|
✓ overfit_greeting
|
|
✓ overfit_file_read
|
|
✓ overfit_identity
|
|
|
|
[执行效率]
|
|
✓ efficiency_greeting
|
|
✓ efficiency_chitchat
|
|
✓ efficiency_identity
|
|
✓ efficiency_react_tool
|
|
✓ efficiency_react_complex
|
|
|
|
[工具搜索准确度]
|
|
✓ tool_search_read
|
|
✓ tool_search_write
|
|
✓ tool_search_web
|
|
✓ tool_search_shell
|
|
✓ tool_search_tests
|
|
✓ tool_search_file_multiple
|
|
✓ tool_search_no_match
|
|
✓ tool_search_empty_query
|
|
|
|
[事件模型完整性]
|
|
✓ sq_submit_and_drain
|
|
✓ eq_emit_and_subscribe
|
|
✓ event_type_classification
|
|
|
|
[Spec 管理功能]
|
|
✓ spec_create_and_get
|
|
✓ spec_confirm
|
|
|
|
[验证循环]
|
|
✓ verify_success
|
|
✓ verify_failure
|
|
|
|
----------------------------------------------------------------------
|
|
改进建议
|
|
----------------------------------------------------------------------
|
|
• 所有维度均达到 100%,架构状态良好
|
|
|
|
====================================================================== |