fischer-agentkit

Author	SHA1	Message	Date
chiguyong	840d1afd6a	fix: resolve benchmark failures from root cause (LLM timeout, WebSocket, latency stats) U1: LLM reasoning - difficulty-based timeout (easy=20s/medium=40s/hard=60s) + streaming keyword detection for hard tasks with non-stream fallback U2: GUI WebSocket - remove unreliable HTTP pre-check (FastAPI returns 404 for HTTP GET to WS endpoints), directly test WS connection, treat {"type":"connected"} as pass (ping/pong is bonus info) U3: Verification latency - exclude timeout-tagged cases from P95/p99 percentile calculation (accuracy stats unaffected) U4: LLM Gateway - add timeout field to LLMRequest, gateway.chat()/ chat_stream() passthrough for provider-level timeout support Test results: 62/63 pass (98.4%), gui-004 fixed, no regressions pytest: 64 passed, ruff: clean	2026-06-17 13:32:54 +08:00
chiguyong	a1318df420	feat: add LLM and GUI benchmark modes with real agent testing	2026-06-17 12:55:19 +08:00
chiguyong	1fbfd9d132	refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline)	2026-06-17 12:01:34 +08:00
chiguyong	d361177cc7	docs: add detailed Chinese benchmark report with industry comparison	2026-06-17 11:34:56 +08:00
chiguyong	d00995504d	feat: comprehensive capability benchmark and agentkit benchmark CLI	2026-06-17 11:28:09 +08:00