Manual smoke test verifying U4 L0 prompt rule rearrangement under real LLM calls (bailian-coding/qwen3.7-plus). 5 probe queries covering external_info / realtime_data / multi_step / realtime_simple / no_tool. Results: - Probe #1 external_info: PASS (8 web_search calls, 99.9s) - Probe #2 realtime_data: ERROR (120s timeout, not LLM refusal) - Probe #3 multi_step: PASS (8 web_search calls, 62.6s) - Probe #4 realtime_data_simple: PASS (3 web_search calls, 23.8s) - Probe #5 no_tool_escape_hatch: PASS (0 tool calls, direct answer, 4.2s) Verdict: 3/4 tool-call pass (>=3/4 threshold) + 1/1 direct pass Bug 2 status upgraded to 'L4 verified'. Plan Progress table updated: U6 done, U7 done. |
||
|---|---|---|
| .. | ||
| documents | ||
| e2e | ||
| integration | ||
| manual | ||
| routes | ||
| tools | ||
| unit | ||
| __init__.py | ||
| conftest.py | ||
| test_routing_chain.py | ||