29 lines
1.3 KiB
Plaintext
29 lines
1.3 KiB
Plaintext
======================================================================
|
|
AgentKit Benchmark Report
|
|
======================================================================
|
|
Timestamp: 2026-06-17T03:26:25.072956+00:00
|
|
Version: 0.1.0
|
|
Overall Score: 98.0%
|
|
Summary: 50/51 tests passed (1 failed) across 7 dimensions.
|
|
|
|
----------------------------------------------------------------------
|
|
Dimension Total Pass Fail Score
|
|
----------------------------------------------------------------------
|
|
preprocessing 15 14 1 93.3%
|
|
overfitting 3 3 0 100.0%
|
|
efficiency 5 5 0 100.0%
|
|
tool_search 10 10 0 100.0%
|
|
event_model 6 6 0 100.0%
|
|
spec_management 7 7 0 100.0%
|
|
verification 5 5 0 100.0%
|
|
----------------------------------------------------------------------
|
|
OVERALL 51 50 1 98.0%
|
|
======================================================================
|
|
|
|
Failed Cases:
|
|
----------------------------------------------------------------------
|
|
[preprocessing] skill_prefix_direct
|
|
expected: skill_react
|
|
actual: direct_chat
|
|
detail: input='@skill:chat_only 你好' method=skill_prefix
|