fischer-agentkit/test-results/e2e/comprehensive_report.txt

95 lines
2.5 KiB
Plaintext

======================================================================
Fischer AgentKit 综合能力回测报告
======================================================================
生成时间: 2026-06-17T05:29:48.993554+00:00
总体评分: 100.0%
用例总数: 50 通过: 50 失败: 0
----------------------------------------------------------------------
各维度得分
----------------------------------------------------------------------
✓ 预处理准确度: 100.0% (17/17)
✓ 技能召回率: 100.0% (8/8)
✓ 过拟合检测: 100.0% (5/5)
✓ 执行效率: 100.0% (5/5)
✓ 工具搜索准确度: 100.0% (8/8)
✓ 事件模型完整性: 100.0% (3/3)
✓ Spec 管理功能: 100.0% (2/2)
✓ 验证循环: 100.0% (2/2)
----------------------------------------------------------------------
详细用例结果
----------------------------------------------------------------------
[预处理准确度]
✓ greeting_cn
✓ greeting_en
✓ greeting_hi
✓ chitchat_thanks
✓ chitchat_ok
✓ identity_who
✓ identity_name
✓ tool_ip
✓ tool_search
✓ tool_shell
✓ tool_file
✓ tool_monitor
✓ complex_analysis
✓ complex_code
✓ complex_multi
✓ skill_prefix_react
✓ skill_prefix_coder
[技能召回率]
✓ recall_valid_react
✓ recall_valid_coder
✓ recall_invalid_skill
✓ recall_no_prefix_react
✓ recall_no_prefix_greeting
✓ recall_no_prefix_complex
✓ recall_skill_only_prefix
✓ recall_skill_with_long_content
[过拟合检测]
✓ overfit_ip_check
✓ overfit_search
✓ overfit_greeting
✓ overfit_file_read
✓ overfit_identity
[执行效率]
✓ efficiency_greeting
✓ efficiency_chitchat
✓ efficiency_identity
✓ efficiency_react_tool
✓ efficiency_react_complex
[工具搜索准确度]
✓ tool_search_read
✓ tool_search_write
✓ tool_search_web
✓ tool_search_shell
✓ tool_search_tests
✓ tool_search_file_multiple
✓ tool_search_no_match
✓ tool_search_empty_query
[事件模型完整性]
✓ sq_submit_and_drain
✓ eq_emit_and_subscribe
✓ event_type_classification
[Spec 管理功能]
✓ spec_create_and_get
✓ spec_confirm
[验证循环]
✓ verify_success
✓ verify_failure
----------------------------------------------------------------------
改进建议
----------------------------------------------------------------------
• 所有维度均达到 100%,架构状态良好
======================================================================