fischer-agentkit/test-results/benchmark/benchmark_report.txt

29 lines
1.3 KiB
Plaintext

======================================================================
AgentKit Benchmark Report
======================================================================
Timestamp: 2026-06-17T03:31:00.118497+00:00
Version: 0.1.0
Overall Score: 98.0%
Summary: 50/51 tests passed (1 failed) across 7 dimensions.
----------------------------------------------------------------------
Dimension Total Pass Fail Score
----------------------------------------------------------------------
preprocessing 15 14 1 93.3%
overfitting 3 3 0 100.0%
efficiency 5 5 0 100.0%
tool_search 10 10 0 100.0%
event_model 6 6 0 100.0%
spec_management 7 7 0 100.0%
verification 5 5 0 100.0%
----------------------------------------------------------------------
OVERALL 51 50 1 98.0%
======================================================================
Failed Cases:
----------------------------------------------------------------------
[preprocessing] skill_prefix_direct
expected: skill_react
actual: direct_chat
detail: input='@skill:chat_only 你好' method=skill_prefix