====================================================================== AgentKit Benchmark Report ====================================================================== Timestamp: 2026-06-17T03:31:00.118497+00:00 Version: 0.1.0 Overall Score: 98.0% Summary: 50/51 tests passed (1 failed) across 7 dimensions. ---------------------------------------------------------------------- Dimension Total Pass Fail Score ---------------------------------------------------------------------- preprocessing 15 14 1 93.3% overfitting 3 3 0 100.0% efficiency 5 5 0 100.0% tool_search 10 10 0 100.0% event_model 6 6 0 100.0% spec_management 7 7 0 100.0% verification 5 5 0 100.0% ---------------------------------------------------------------------- OVERALL 51 50 1 98.0% ====================================================================== Failed Cases: ---------------------------------------------------------------------- [preprocessing] skill_prefix_direct expected: skill_react actual: direct_chat detail: input='@skill:chat_only 你好' method=skill_prefix