fischer-agentkit/test-results/benchmark
chiguyong 1fbfd9d132 refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline) 2026-06-17 12:01:34 +08:00
..
baseline.json refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline) 2026-06-17 12:01:34 +08:00
benchmark_report.html feat: comprehensive capability benchmark and agentkit benchmark CLI 2026-06-17 11:28:09 +08:00
benchmark_report.json refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline) 2026-06-17 12:01:34 +08:00
benchmark_report.md refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline) 2026-06-17 12:01:34 +08:00
benchmark_report.txt feat: comprehensive capability benchmark and agentkit benchmark CLI 2026-06-17 11:28:09 +08:00
benchmark_report_cn.md docs: add detailed Chinese benchmark report with industry comparison 2026-06-17 11:34:56 +08:00