======================================================================
AgentKit Benchmark Report
======================================================================
Timestamp:      2026-06-17T03:31:00.118497+00:00
Version:        0.1.0
Overall Score:  98.0%
Summary:        50/51 tests passed (1 failed) across 7 dimensions.

----------------------------------------------------------------------
Dimension             Total   Pass   Fail    Score
----------------------------------------------------------------------
preprocessing            15     14      1   93.3%
overfitting               3      3      0  100.0%
efficiency                5      5      0  100.0%
tool_search              10     10      0  100.0%
event_model               6      6      0  100.0%
spec_management           7      7      0  100.0%
verification              5      5      0  100.0%
----------------------------------------------------------------------
OVERALL                  51     50      1   98.0%
======================================================================

Failed Cases:
----------------------------------------------------------------------
  [preprocessing] skill_prefix_direct
    expected: skill_react
    actual:   direct_chat
    detail:   input='@skill:chat_only 你好' method=skill_prefix
