# refactor: 路由架构简化 — 统一 REACT Agent Loop

status: active
created: 2026-06-16
depth: Standard

---

## Summary

将当前 4 层路由架构（HeuristicClassifier → LLM classify → SemanticRouter → IntentRouter）简化为极简路由层 + 统一 REACT Agent Loop（Hermes 模式 Prompt-based XML tool calling）。删除意图预测层，让 LLM 在 agent loop 中看到完整工具描述后自主决策。

## Problem Frame

当前 CostAwareRouter 的 4 层路由架构存在根本性设计缺陷：

1. **路由层预测意图是反模式** — LLM 在路由层看不到工具上下文，必然误判（如"查下ip"被分为 direct_agent）
2. **枚举永远覆盖不完** — HeuristicClassifier 的关键词列表无法覆盖所有口语化说法
3. **多层路由增加延迟** — 每次查询 3 次 LLM 调用（路由1 + REACT2），响应 3-5s
4. **双链路不一致** — Portal REST 走 IntentRouter，WebSocket 走 CostAwareRouter
5. **工具格式不兼容** — 百炼 Coding 不支持原生 function calling，模型输出 `<tool_use>` 文本但引擎无法解析

**行业验证**：Codex、Trae、Hermes、OpenClaw 均无独立路由层，统一 agent loop 是业界标准。

## Requirements

- R1: 删除 HeuristicClassifier、IntentRouter、SemanticRouter 的路由决策功能
- R2: 保留极简路由层（@skill 前缀 + 问候/闲聊检测）
- R3: 统一 REACT Agent Loop，System Prompt 注入完整工具描述
- R4: Prompt-based XML tool calling（`<tool_use>` 格式），后端解析执行
- R5: Portal REST 和 WebSocket 统一路由路径
- R6: 聊天记录持久化（Portal ConversationStore → SessionManager）
- R7: 回测验证：执行模式准确率 >85%，工具调用成功率 >95%，口语化查询成功率 >90%
- R8: 性能指标：响应时间 <3s（简单查询），LLM 调用次数 ≤2 次/查询

---

## Key Technical Decisions

### KTD-1: 采用 Hermes 模式 Prompt-based XML Tool Calling

**决策**：System Prompt 中定义 `<tool_use>` 格式，LLM 输出 XML 标签，后端解析执行。

**理由**：
- 百炼 Coding（qwen3.7-plus）不支持原生 function calling
- 截图验证模型已理解 `<tool_use>` 格式
- 与 Hermes 架构一致，模型无关

**替代方案**：
- 原生 function calling：百炼 Coding 不兼容
- Action: 格式：不如 XML 结构化

### KTD-2: 删除路由层意图预测，保留极简规则层

**决策**：只保留 @skill 前缀路由和问候/闲聊检测，其他所有查询默认走 REACT。

**理由**：
- 路由层预测意图的准确率远低于 LLM 在 agent loop 中的决策
- 删除路由层节省 1 次 LLM 调用（~500ms，~1000 tokens）
- 问候/闲聊检测是确定性规则，零误判

### KTD-3: 工具全量加载（第一阶段）

**决策**：默认加载所有 21 个工具到 System Prompt，通过 @skill 前缀实现按需加载。

**理由**：
- 21 个工具的描述约 2000 tokens，成本可接受
- 全量加载保证 LLM 能看到所有工具，零误判
- 按需加载（Regex 筛选）留作第二阶段优化

### KTD-4: 保留其他 Agent 架构作为 skill 配置可选模式

**决策**：ReWOOAgent、ReflexionAgent 等保留，通过 skill YAML 的 `execution_mode` 字段切换。

**理由**：
- 不同场景需要不同执行模式（代码生成用 ReWOO，失败重试用 Reflexion）
- 已有投入不应浪费
- 只是路由方式变了，执行模式不变

---

## Scope Boundaries

### In Scope
- 简化 CostAwareRouter 为极简路由层
- ReActEngine 改为 prompt-based tool calling
- Portal REST/WebSocket 统一路由
- 聊天记录持久化
- E2E 回测和指标验证

### Out of Scope
- Embedding API 集成（待用户提供 API key）
- 前端 GUI 改造
- Expert Team 模式重构
- 工具按需加载的 Regex 筛选层（第二阶段）

### Deferred to Follow-Up Work
- SemanticRouter 降级为可选插件
- 工具数量 >30 时的分组加载策略
- 响应流式优化（SSE chunk 细化）

---

## High-Level Technical Design

### 目标架构

```
用户输入
  ↓
SimpleRouter（极简路由层，<1ms）
  ├─ @skill:xxx → 加载指定 skill 工具 → REACT Agent
  ├─ 问候/闲聊（regex）→ DIRECT_CHAT（无工具，快速路径）
  └─ 其他 → 加载所有默认工具 → REACT Agent
       ↓
  REACT Agent Loop
  ├─ System Prompt: 工具描述 + <tool_use> 格式说明
  ├─ LLM 决策: 需要 → 输出 <tool_use> → 解析执行 → Observation → 继续
  └─ LLM 决策: 不需要 → 直接回答 → final_answer
```

### 路由简化对比

| 组件 | 当前 | 目标 |
|------|------|------|
| CostAwareRouter.route() | 1688 行，4 层 | ~200 行，1 层 |
| HeuristicClassifier | 310 行 | 删除 |
| IntentRouter | 206 行 | 删除路由功能 |
| SemanticRouter | 224 行 | 删除路由功能 |
| _classify_merged | 200 行 | 删除 |
| _route_layer2 | 210 行 | 删除 |

---

## Implementation Units

### U1. 创建 SimpleRouter 替代 CostAwareRouter

**Goal**: 实现极简路由层，只保留 @skill 前缀和问候/闲聊检测

**Requirements**: R1, R2

**Dependencies**: 无

**Files**:
- `src/agentkit/chat/simple_router.py` (新建)
- `src/agentkit/chat/skill_routing.py` (修改 — 保留 SkillRoutingResult、ExecutionMode、parse_skill_prefix)
- `tests/unit/chat/test_simple_router.py` (新建)

**Approach**:
1. 新建 `SimpleRouter` 类，包含 `route()` 方法
2. `route()` 逻辑：@skill 前缀 → 指定 skill；问候/闲聊 regex → DIRECT_CHAT；其他 → REACT
3. 保留 `SkillRoutingResult` 数据类和 `ExecutionMode` 枚举
4. 保留 `parse_skill_prefix()` 函数
5. 保留 `_GREETING_RE` 和 `_CHAT_MODE_RE` 正则

**Test scenarios**:
- @skill:shell 前缀正确路由到 shell skill
- "你好" 路由到 DIRECT_CHAT
- "查看当前ip" 路由到 REACT
- "查下ip" 路由到 REACT
- "翻译hello" 路由到 REACT（LLM 决定不需要工具）
- 无前缀无问候的复杂查询路由到 REACT

**Verification**: 所有测试通过，SimpleRouter.route() 返回正确的 ExecutionMode

### U2. ReActEngine 改为 Prompt-based XML Tool Calling

**Goal**: ReActEngine 的 system prompt 注入完整工具描述和 `<tool_use>` 格式说明

**Requirements**: R3, R4

**Dependencies**: U1

**Files**:
- `src/agentkit/core/react.py` (修改)
- `tests/unit/core/test_react_tool_format.py` (新建)

**Approach**:
1. 新增 `_build_tool_use_system_prompt()` 方法，生成包含工具描述和 `<tool_use>` 格式说明的 system prompt
2. 在 `execute_stream()` 中，当 LLM 不支持原生 function calling 时，使用 prompt-based 模式
3. 确保 `_parse_text_tool_calls()` 正确解析 `<tool_use>` XML 格式（已实现）
4. 添加工具描述格式：每个工具包含 name、description、parameters

**Test scenarios**:
- system prompt 包含所有工具描述
- `<tool_use>` 格式被正确解析
- LLM 不使用工具时直接返回 final_answer
- LLM 使用工具时正确执行并返回 observation
- 多步工具调用（think → act → observe → think → answer）

**Verification**: curl 测试"查下ip"正确执行 shell 命令

### U3. Portal REST/WebSocket 统一路由路径

**Goal**: Portal REST chat 和 WebSocket 使用相同的 SimpleRouter 路由逻辑

**Requirements**: R5

**Dependencies**: U1

**Files**:
- `src/agentkit/server/routes/portal.py` (修改)
- `src/agentkit/server/app.py` (修改 — 替换 cost_aware_router 为 simple_router)

**Approach**:
1. `_resolve_for_chat()` 改用 SimpleRouter
2. WebSocket `portal_websocket()` 改用 SimpleRouter
3. 两条路径统一走 SimpleRouter.route() → REACT Agent Loop
4. 保留 DIRECT_CHAT 快速路径

**Test scenarios**:
- REST "查看当前ip" 正确执行 shell
- WebSocket "查看当前ip" 正确执行 shell
- REST "你好" 走 DIRECT_CHAT
- WebSocket "你好" 走 DIRECT_CHAT

**Verification**: curl 和前端测试均通过

### U4. 聊天记录持久化

**Goal**: Portal ConversationStore 接入后端 SessionManager，支持 file 持久化

**Requirements**: R6

**Dependencies**: U3

**Files**:
- `src/agentkit/server/routes/portal.py` (修改)
- `src/agentkit/session/manager.py` (修改 — 如需新增方法)
- `tests/unit/server/test_portal_persistence.py` (新建)

**Approach**:
1. ConversationStore 委托 SessionManager 进行持久化
2. 新消息写入时同步写入 SessionManager
3. 加载会话时从 SessionManager 恢复
4. 保持内存缓存作为热路径

**Test scenarios**:
- 新消息写入后可从 SessionManager 读取
- 服务重启后会话历史保留
- 多轮对话上下文正确

**Verification**: 重启服务后聊天记录仍在

### U5. 更新 E2E 回测用例和指标

**Goal**: 更新回测用例覆盖口语化说法，定义和跟踪指标

**Requirements**: R7, R8

**Dependencies**: U1, U2, U3

**Files**:
- `tests/e2e/test_capability_router_direct.py` (修改)
- `tests/e2e/capability_metrics.py` (修改)
- `docs/plans/2026-06-16-005-refactor-routing-architecture-plan.md` (本文档)

**Approach**:
1. 更新回测用例：增加口语化说法（"查下ip"、"获取ip"、"看下ip"等）
2. 更新指标：增加响应时间、LLM 调用次数、token 消耗
3. 定义目标值：执行模式准确率 >85%，工具调用成功率 >95%，口语化成功率 >90%
4. 运行回测并记录结果

**Test scenarios**:
- 口语化查询（"查下ip"）正确路由到 REACT
- 工具调用查询正确执行工具
- 问候语正确路由到 DIRECT_CHAT
- 响应时间 <3s
- LLM 调用次数 ≤2

**Verification**: 回测报告显示所有指标达标

---

## Success Metrics

| 指标 | 当前值 | 目标值 | 测量方式 |
|------|-------|-------|---------|
| 执行模式准确率 | 40% | >85% | E2E 回测 |
| 工具调用成功率 | 60% | >95% | E2E 回测 |
| 口语化查询成功率 | 30% | >90% | E2E 回测 |
| 响应时间（简单查询）| 3-5s | <3s | curl -w "%{time_total}" |
| 响应时间（工具调用）| 5-8s | <4s | curl -w "%{time_total}" |
| LLM 调用次数/查询 | 3 | ≤2 | 日志统计 |
| Token 消耗/查询 | ~2400 | <1800 | LLM gateway 统计 |

---

## Risks & Mitigations

| 风险 | 影响 | 缓解措施 |
|------|------|---------|
| 百炼 Coding 不理解 `<tool_use>` 格式 | 工具调用失败 | 已验证模型输出 `<tool_use>`；回退到 Action: 格式 |
| 全量工具描述 token 过多 | 响应变慢 | 21 个工具约 2000 tokens，可接受；第二阶段按需加载 |
| 删除路由层后 skill 匹配丢失 | 特定 skill 不被选中 | @skill 前缀显式指定；LLM 在 agent loop 中自然匹配 |
| 聊天记录迁移不兼容 | 旧数据丢失 | 新旧格式兼容；渐进迁移 |

---

## Open Questions

1. Embedding API key 何时提供？（SemanticRouter 降级为可选插件依赖此 key）
2. 是否需要保留 CostAwareRouter 作为可选模式？（向后兼容）