11 KiB

Raw Blame History

refactor: 路由架构简化 — 统一 REACT Agent Loop

status: active created: 2026-06-16 depth: Standard

Summary

将当前 4 层路由架构（HeuristicClassifier → LLM classify → SemanticRouter → IntentRouter）简化为极简路由层 + 统一 REACT Agent Loop（Hermes 模式 Prompt-based XML tool calling）。删除意图预测层，让 LLM 在 agent loop 中看到完整工具描述后自主决策。

Problem Frame

当前 CostAwareRouter 的 4 层路由架构存在根本性设计缺陷：

路由层预测意图是反模式 — LLM 在路由层看不到工具上下文，必然误判（如"查下ip"被分为 direct_agent）
枚举永远覆盖不完 — HeuristicClassifier 的关键词列表无法覆盖所有口语化说法
多层路由增加延迟 — 每次查询 3 次 LLM 调用（路由1 + REACT2），响应 3-5s
双链路不一致 — Portal REST 走 IntentRouter，WebSocket 走 CostAwareRouter
工具格式不兼容 — 百炼 Coding 不支持原生 function calling，模型输出 <tool_use> 文本但引擎无法解析

行业验证：Codex、Trae、Hermes、OpenClaw 均无独立路由层，统一 agent loop 是业界标准。

Requirements

R1: 删除 HeuristicClassifier、IntentRouter、SemanticRouter 的路由决策功能
R2: 保留极简路由层（@skill 前缀 + 问候/闲聊检测）
R3: 统一 REACT Agent Loop，System Prompt 注入完整工具描述
R4: Prompt-based XML tool calling（<tool_use> 格式），后端解析执行
R5: Portal REST 和 WebSocket 统一路由路径
R6: 聊天记录持久化（Portal ConversationStore → SessionManager）
R7: 回测验证：执行模式准确率 >85%，工具调用成功率 >95%，口语化查询成功率 >90%
R8: 性能指标：响应时间 <3s（简单查询），LLM 调用次数 ≤2 次/查询

Key Technical Decisions

KTD-1: 采用 Hermes 模式 Prompt-based XML Tool Calling

决策：System Prompt 中定义 <tool_use> 格式，LLM 输出 XML 标签，后端解析执行。

理由：

百炼 Coding（qwen3.7-plus）不支持原生 function calling
截图验证模型已理解 <tool_use> 格式
与 Hermes 架构一致，模型无关

替代方案：

原生 function calling：百炼 Coding 不兼容
Action: 格式：不如 XML 结构化

KTD-2: 删除路由层意图预测，保留极简规则层

决策：只保留 @skill 前缀路由和问候/闲聊检测，其他所有查询默认走 REACT。

理由：

路由层预测意图的准确率远低于 LLM 在 agent loop 中的决策
删除路由层节省 1 次 LLM 调用（~500ms，~1000 tokens）
问候/闲聊检测是确定性规则，零误判

KTD-3: 工具全量加载（第一阶段）

决策：默认加载所有 21 个工具到 System Prompt，通过 @skill 前缀实现按需加载。

理由：

21 个工具的描述约 2000 tokens，成本可接受
全量加载保证 LLM 能看到所有工具，零误判
按需加载（Regex 筛选）留作第二阶段优化

KTD-4: 保留其他 Agent 架构作为 skill 配置可选模式

决策：ReWOOAgent、ReflexionAgent 等保留，通过 skill YAML 的 execution_mode 字段切换。

理由：

不同场景需要不同执行模式（代码生成用 ReWOO，失败重试用 Reflexion）
已有投入不应浪费
只是路由方式变了，执行模式不变

Scope Boundaries

In Scope

简化 CostAwareRouter 为极简路由层
ReActEngine 改为 prompt-based tool calling
Portal REST/WebSocket 统一路由
聊天记录持久化
E2E 回测和指标验证

Out of Scope

Embedding API 集成（待用户提供 API key）
前端 GUI 改造
Expert Team 模式重构
工具按需加载的 Regex 筛选层（第二阶段）

Deferred to Follow-Up Work

SemanticRouter 降级为可选插件
工具数量 >30 时的分组加载策略
响应流式优化（SSE chunk 细化）

High-Level Technical Design

目标架构

用户输入
  ↓
SimpleRouter（极简路由层，<1ms）
  ├─ @skill:xxx → 加载指定 skill 工具 → REACT Agent
  ├─ 问候/闲聊（regex）→ DIRECT_CHAT（无工具，快速路径）
  └─ 其他 → 加载所有默认工具 → REACT Agent
       ↓
  REACT Agent Loop
  ├─ System Prompt: 工具描述 + <tool_use> 格式说明
  ├─ LLM 决策: 需要 → 输出 <tool_use> → 解析执行 → Observation → 继续
  └─ LLM 决策: 不需要 → 直接回答 → final_answer

路由简化对比

组件	当前	目标
CostAwareRouter.route()	1688 行，4 层	~200 行，1 层
HeuristicClassifier	310 行	删除
IntentRouter	206 行	删除路由功能
SemanticRouter	224 行	删除路由功能
_classify_merged	200 行	删除
_route_layer2	210 行	删除

Implementation Units

U1. 创建 SimpleRouter 替代 CostAwareRouter

Goal: 实现极简路由层，只保留 @skill 前缀和问候/闲聊检测

Requirements: R1, R2

Dependencies: 无

Files:

src/agentkit/chat/simple_router.py (新建)
src/agentkit/chat/skill_routing.py (修改 — 保留 SkillRoutingResult、ExecutionMode、parse_skill_prefix)
tests/unit/chat/test_simple_router.py (新建)

Approach:

新建 SimpleRouter 类，包含 route() 方法
route() 逻辑：@skill 前缀 → 指定 skill；问候/闲聊 regex → DIRECT_CHAT；其他 → REACT
保留 SkillRoutingResult 数据类和 ExecutionMode 枚举
保留 parse_skill_prefix() 函数
保留 _GREETING_RE 和 _CHAT_MODE_RE 正则

Test scenarios:

@skill:shell 前缀正确路由到 shell skill
"你好" 路由到 DIRECT_CHAT
"查看当前ip" 路由到 REACT
"查下ip" 路由到 REACT
"翻译hello" 路由到 REACT（LLM 决定不需要工具）
无前缀无问候的复杂查询路由到 REACT

Verification: 所有测试通过，SimpleRouter.route() 返回正确的 ExecutionMode

U2. ReActEngine 改为 Prompt-based XML Tool Calling

Goal: ReActEngine 的 system prompt 注入完整工具描述和 <tool_use> 格式说明

Requirements: R3, R4

Dependencies: U1

Files:

src/agentkit/core/react.py (修改)
tests/unit/core/test_react_tool_format.py (新建)

Approach:

新增 _build_tool_use_system_prompt() 方法，生成包含工具描述和 <tool_use> 格式说明的 system prompt
在 execute_stream() 中，当 LLM 不支持原生 function calling 时，使用 prompt-based 模式
确保 _parse_text_tool_calls() 正确解析 <tool_use> XML 格式（已实现）
添加工具描述格式：每个工具包含 name、description、parameters

Test scenarios:

system prompt 包含所有工具描述
<tool_use> 格式被正确解析
LLM 不使用工具时直接返回 final_answer
LLM 使用工具时正确执行并返回 observation
多步工具调用（think → act → observe → think → answer）

Verification: curl 测试"查下ip"正确执行 shell 命令

U3. Portal REST/WebSocket 统一路由路径

Goal: Portal REST chat 和 WebSocket 使用相同的 SimpleRouter 路由逻辑

Requirements: R5

Dependencies: U1

Files:

src/agentkit/server/routes/portal.py (修改)
src/agentkit/server/app.py (修改 — 替换 cost_aware_router 为 simple_router)

Approach:

_resolve_for_chat() 改用 SimpleRouter
WebSocket portal_websocket() 改用 SimpleRouter
两条路径统一走 SimpleRouter.route() → REACT Agent Loop
保留 DIRECT_CHAT 快速路径

Test scenarios:

REST "查看当前ip" 正确执行 shell
WebSocket "查看当前ip" 正确执行 shell
REST "你好" 走 DIRECT_CHAT
WebSocket "你好" 走 DIRECT_CHAT

Verification: curl 和前端测试均通过

U4. 聊天记录持久化

Goal: Portal ConversationStore 接入后端 SessionManager，支持 file 持久化

Requirements: R6

Dependencies: U3

Files:

src/agentkit/server/routes/portal.py (修改)
src/agentkit/session/manager.py (修改 — 如需新增方法)
tests/unit/server/test_portal_persistence.py (新建)

Approach:

ConversationStore 委托 SessionManager 进行持久化
新消息写入时同步写入 SessionManager
加载会话时从 SessionManager 恢复
保持内存缓存作为热路径

Test scenarios:

新消息写入后可从 SessionManager 读取
服务重启后会话历史保留
多轮对话上下文正确

Verification: 重启服务后聊天记录仍在

U5. 更新 E2E 回测用例和指标

Goal: 更新回测用例覆盖口语化说法，定义和跟踪指标

Requirements: R7, R8

Dependencies: U1, U2, U3

Files:

tests/e2e/test_capability_router_direct.py (修改)
tests/e2e/capability_metrics.py (修改)
docs/plans/2026-06-16-005-refactor-routing-architecture-plan.md (本文档)

Approach:

更新回测用例：增加口语化说法（"查下ip"、"获取ip"、"看下ip"等）
更新指标：增加响应时间、LLM 调用次数、token 消耗
定义目标值：执行模式准确率 >85%，工具调用成功率 >95%，口语化成功率 >90%
运行回测并记录结果

Test scenarios:

口语化查询（"查下ip"）正确路由到 REACT
工具调用查询正确执行工具
问候语正确路由到 DIRECT_CHAT
响应时间 <3s
LLM 调用次数 ≤2

Verification: 回测报告显示所有指标达标

Success Metrics

指标	当前值	目标值	测量方式
执行模式准确率	40%	>85%	E2E 回测
工具调用成功率	60%	>95%	E2E 回测
口语化查询成功率	30%	>90%	E2E 回测
响应时间（简单查询）	3-5s	<3s	curl -w "%{time_total}"
响应时间（工具调用）	5-8s	<4s	curl -w "%{time_total}"
LLM 调用次数/查询	3	≤2	日志统计
Token 消耗/查询	~2400	<1800	LLM gateway 统计

Risks & Mitigations

风险	影响	缓解措施
百炼 Coding 不理解 `<tool_use>` 格式	工具调用失败	已验证模型输出 `<tool_use>`；回退到 Action: 格式
全量工具描述 token 过多	响应变慢	21 个工具约 2000 tokens，可接受；第二阶段按需加载
删除路由层后 skill 匹配丢失	特定 skill 不被选中	@skill 前缀显式指定；LLM 在 agent loop 中自然匹配
聊天记录迁移不兼容	旧数据丢失	新旧格式兼容；渐进迁移

Open Questions

Embedding API key 何时提供？（SemanticRouter 降级为可选插件依赖此 key）
是否需要保留 CostAwareRouter 作为可选模式？（向后兼容）

11 KiB Raw Blame History Unescape Escape

refactor: 路由架构简化 — 统一 REACT Agent Loop

Summary

Problem Frame

Requirements

Key Technical Decisions

KTD-1: 采用 Hermes 模式 Prompt-based XML Tool Calling

KTD-2: 删除路由层意图预测，保留极简规则层

KTD-3: 工具全量加载（第一阶段）

KTD-4: 保留其他 Agent 架构作为 skill 配置可选模式

Scope Boundaries

In Scope

Out of Scope

Deferred to Follow-Up Work

High-Level Technical Design

目标架构

路由简化对比

Implementation Units

U1. 创建 SimpleRouter 替代 CostAwareRouter

U2. ReActEngine 改为 Prompt-based XML Tool Calling

U3. Portal REST/WebSocket 统一路由路径

U4. 聊天记录持久化

U5. 更新 E2E 回测用例和指标

Success Metrics

Risks & Mitigations

Open Questions

11 KiB

Raw Blame History