diff --git a/docs/plans/2026-07-02-002-fix-transient-state-reset-and-react-tool-guidance-plan.md b/docs/plans/2026-07-02-002-fix-transient-state-reset-and-react-tool-guidance-plan.md new file mode 100644 index 0000000..602f99d --- /dev/null +++ b/docs/plans/2026-07-02-002-fix-transient-state-reset-and-react-tool-guidance-plan.md @@ -0,0 +1,270 @@ +--- +date: 2026-07-02 +type: fix +title: 修复私董会 transient state 残留 + ReAct 工具调用引导不足 +status: ready +--- + +## Summary + +收尾两个独立 bug:(1) 前端 store-level transient state(`boardState` / `debateState` / `collaborationState`)在 `createConversation` / `selectConversation` / `deleteConversation` 三个动作下的重置口径不一致,导致新建对话后私董会顶部标题残留、跨会话状态泄漏;(2) ReAct 引擎 `_build_tool_use_prompt` 规则 3 "如果不需要工具就能回答,直接回答即可" 给 LLM 留出偷懒窗口,且工具调用提示被后置的 tool section 覆盖,导致复杂需求(涉及外部数据 / 多步分析)下 LLM 倾向于直接回答而非调用 `web_search` / `baidu_search`。Bug 1 覆盖前端 3 个 action 路径对称重置;Bug 2 仅做 L0(提示规则调整),L1(工具描述扩展)与 L2(PLAN_EXEC 启用)按用户决策拆为独立 plan。 + +## Problem Frame + +**Bug 1:私董会顶部标题在新对话后残留** + +根因([chatStore.ts:333-345](src/agentkit/server/frontend/src/stores/chatStore.ts#L333-L345))`createConversation()` 仅清空 `streamingSteps`,**未重置** `stream.boardState.value` / `debateState.value` / `collaborationState.value`。`StickyModeHeader.vue:113-117` 的 `mode` computed 依赖 `chatStore.boardState` 渲染"私董会"模式(带旧专家头像行),新对话切换时状态未清空 → 旧私董会标题持续显示。 + +同源问题散落三处: +- `createConversation` (chatStore.ts:333-345) — 三个 state 全漏 +- `selectConversation` (chatStore.ts:219-330) — 仅在 404 分支(line 266-267)重置 board/debate,主流程 line 222 仅重置 collaboration;正常切换不重置 board/debate +- `deleteConversation` (chatStore.ts:348-372) — line 364-365 重置 board/debate,但**漏**了 `collaborationState` + +**Bug 2:Agent 面对复杂需求时倾向直接回答** + +根因([react.py:1605-1616](src/agentkit/core/react.py#L1605-L1616))`_build_tool_use_prompt` 拼接的规则 3: + +``` +3. 如果不需要工具就能回答,直接回答即可 +``` + +此规则是 LLM 偷懒的合法依据。`base_prompt`([server/app.py:200-206](src/agentkit/server/app.py#L200-L206))已有正向引导"当你不确定事实信息、时事新闻或任何你不确信的话题时,你必须先使用搜索工具",但位置在 system prompt **前部**。**假设**:后注入的 tool section 可能在注意力分配上弱化了 base_prompt 的正向引导(此假设未经 ablation 验证,但规则重排本身的风险可控 — 见 KTD)。`web_search` / `baidu_search` 工具描述也无"何时使用"的触发条件。 + +**预测(Bug 1)**:若在 `createConversation` 末尾加 `stream.boardState.value = null; stream.debateState.value = null; stream.collaborationState.value = null;`,`StickyModeHeader` 的 `mode` 返回 `null`,`v-if="mode"` 不渲染 → 标题消失;同源问题在 `selectConversation` / `deleteConversation` 也应统一口径。 + +**预测(Bug 2)**:若规则 3 措辞改为"涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具;仅在确实无需工具时可直接回答",且**重排为规则 1**(先说"何时必须",再列"何时不必"),LLM 面对"GitHub Trending + 商业价值分析"类需求时调用 `web_search` 的概率显著提升(实证需 L4 真实 LLM smoke test 验证)。 + +## Requirements + +**R1**(Bug 1):`createConversation` 末尾重置 `stream.boardState.value = null; stream.debateState.value = null; stream.collaborationState.value = null;`(与现有 `stream.clearConvSteps` 顺序一致,先 stream-owned transient 后 streaming 步骤清理,避免响应式 watcher 误触发) + +**R2**(Bug 1):`selectConversation` 在切换到不同 conversation 时(即非 404 分支、亦非 `selectConversation(sameId)`),替换 line 222 的无条件 `collaborationState` 重置为顶部条件性三 state 重置(仅当 `prevConvId !== id` 时触发 `boardState` / `debateState` / `collaborationState` 置 null),避免 force-reload 同一会话时误清空状态 + +**R3**(Bug 1):`deleteConversation` 删除后切换到下一个会话时,若当前会话非 `currentConversationId`,三个 state 不动;若当前会话被删,三个 state 全部置 null(包括 `collaborationState`),与 `selectConversation` 口径一致 + +**R4**(Bug 2 L0):`_build_tool_use_prompt` 规则重排为: +``` +1. 涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具 +2. 每次只调用一个工具 +3. 等待工具返回结果后再决定下一步 +4. 仅在确实无需工具时可直接回答 +5. 不要在回答中重复工具的输出,而是基于结果给出有用的总结 +``` + +**R5**(Bug 2 L0):保持 base_prompt 不动(按用户决策 L1/L2 拆为独立 plan) + +**R6**(Bug 2 L0):不在本次改 web_search / baidu_search 工具 description + +## Key Technical Decisions + +- **三个 transient state 的重置时机选择**:选在 `createConversation` / `selectConversation` / `deleteConversation` action 内统一重置(而非在 `useChatStream` 初始化时按 conversationId 拆分 ref),理由是当前 `useChatStream` 是 store-level 单例,stream-owned state 全部是单一 ref。重构为"按 conversationId 拆分的 reactive map"是更大的架构变更(影响所有读取 `stream.boardState` 的组件),不在本 plan 范围内 +- **selectConversation 切换检测**:以 `currentConversationId` 是否变化为重置条件(即 `id !== currentConversationId.value` 才重置),避免 `selectConversation(sameId, true)` 这种 force-reload 误清空 +- **createConversation 中重置顺序**:先 transient state(board/debate/collaboration),再 `clearConvSteps`。理由:transient state 派生渲染(StickyModeHeader、CollaborationGraph),清空它会触发对应组件卸载;`clearConvSteps` 删除 streaming 步骤,触发 streaming UI 收尾。先 transient 后 streaming 保证 UI 状态先稳定再清理流 +- **R4 规则重排而非删除**:保留原规则 3 的语义("无需工具时可直接回答")作为规则 4,仅位置后移 + 措辞收敛。**理论选择**:本 plan 的修复理论是"位置优先" — 在同一 prompt block 内,靠前的规则获得更多注意力权重。此理论未经 ablation 验证,但规则重排的回归风险可控(trivial 输入走 DIRECT_CHAT 不进 ReAct)。保留 rule 4 的真实理由:非 trivial 但无需外部工具的输入(如"总结这段文字"、"改写这段话")会进入 ReAct 循环,此类输入不应被强制工具化,需要一个 escape hatch。"你好/介绍下自己"类 L1 trivial 输入已走 DIRECT_CHAT,不构成保留理由 +- **不动 web_search / baidu_search description**:按用户决策推迟到独立 plan。L0 调整后若真实 LLM 行为改善有限再决定 L1 +- **不在 L0 引入 PhasePolicy / PLAN_EXEC**:L2(启用 PLAN_EXEC 让复杂需求先输出计划再执行)按用户决策拆为独立 plan +- **测试策略**:Bug 1 用 vitest 单元测试覆盖三个 action 的状态重置矩阵;Bug 2 L0 用 pytest 单元测试断言 `_build_tool_use_prompt` 输出文本包含新规则 + 不包含旧规则 3 措辞;不强制跑真实 LLM(依赖 API key 与网络,且不稳定),但写一个 mock-based test 验证 web_search 描述出现在 prompt 中 +- **Bug 2 验收门槛**:L0 文本断言通过后,Bug 2 状态标记为"hypothesis applied, pending L4 verification"(非"fixed")。在 L1/L2 独立 plan 中包含真实 LLM smoke test:用 5 个 probe query(如"GitHub Trending 分析"、"最新 AI 新闻"等),对比 fix 前后 web_search 调用率,目标 ≥4/5 触发工具调用。L0 plan 不包含此 smoke test 但在 Verification 中显式记录此降级状态 + +## Implementation Units + +### U1. createConversation 补全 transient state 重置 + +**Goal:** 修复 Bug 1 的第一处泄漏点 — `createConversation` 创建新会话时同步重置 `boardState` / `debateState` / `collaborationState` 三个 stream-owned ref。 + +**Requirements:** R1 + +**Dependencies:** 无 + +**Files:** +- src/agentkit/server/frontend/src/stores/chatStore.ts(修改 line 333-345 `createConversation`) +- src/agentkit/server/frontend/tests/unit/stores/chatStore.test.ts(追加测试) + +**Approach:** +- 在 `createConversation` 的 `stream.clearConvSteps(newConversation.id)` 之前插入三行: + ```ts + stream.boardState.value = null; + stream.debateState.value = null; + stream.collaborationState.value = null; + ``` +- 不动 `stream.collaborationState` 已有逻辑(line 222 在 `selectConversation` 顶部重置) +- 不动 `pendingConversations` / `pendingLastUsedAt`(与本 bug 无关) + +**Patterns to follow:** +- `chatStore.ts:264-281` 的 404 fallback 中 `stream.boardState.value = null; stream.debateState.value = null;` 的写法(line 266-267)— 同样模式 +- `chatStore.ts:222` 的 `stream.collaborationState.value = null;` — 单行重置 + +**Test scenarios:** +- Happy path:调 `createConversation()` 后,`chatStore.boardState === null` 且 `chatStore.debateState === null` 且 `chatStore.collaborationState === null` +- Edge case:先 `selectConversation` 一个含 board_started 的旧会话(boardState 非 null),再 `createConversation()` → 三个 state 全为 null +- Edge case:`createConversation()` 后 `currentConversationId` 指向新会话 ID,且新会话的 streamingStepsByConv entry 被清空(已有 `stream.clearConvSteps` 行为,回归测试) + +**Verification:** `cd src/agentkit/server/frontend && npm run test:unit -- --reporter=verbose 2>&1 | grep -E "chatStore|transient"` 通过;手动验证:先开 @board,再点新建对话,StickyModeHeader 不再显示私董会专家头像行 + +### U2. selectConversation 统一 transient state 重置口径 + +**Goal:** 修复 Bug 1 的第二处泄漏点 — `selectConversation` 从带 boardState 的会话切到其他会话时残留旧 boardState;同时与 `createConversation` / `deleteConversation` 口径对齐。 + +**Requirements:** R2 + +**Dependencies:** 无(与 U1 并行可合入) + +**Files:** +- src/agentkit/server/frontend/src/stores/chatStore.ts(修改 line 219-330 `selectConversation`) +- src/agentkit/server/frontend/tests/unit/stores/chatStore.test.ts(追加测试) + +**Approach:** +- 在 `selectConversation` 顶部 line 220 `currentConversationId.value = id` 之后,判断是否切换到不同会话: + ```ts + const isSwitching = currentConversationId.value !== id; // 注意:line 220 已写入 id,此处判断为切换条件需对比旧值 + ``` + 注:line 220 已先写入新 id,所以需要先把旧 id 缓存到临时变量,再 line 220 写入新 id,再判断 +- 若 isSwitching 为 true 且新会话不含 board_started(由 `restoreBoardStateFromMessages` 返回 null),三个 state 置 null: + - 由于后续 line 307 `stream.boardState.value = restoreBoardStateFromMessages(...)` 会覆盖,所以"先置 null 再被覆盖"是安全的 + - 但若旧会话的 boardState 包含 stream-derived 数据(如 liveColorByName)可能丢失 — 实际上 boardState.value 被整体覆盖为新对象,旧 stream-derived map 在 `allExperts` computed 中会基于新 boardState 重新构建([StickyModeHeader.vue:160-184](src/agentkit/server/frontend/src/components/chat/StickyModeHeader.vue#L160-L184)),所以无副作用 +- 简化方案:直接删除 line 222 的 `stream.collaborationState.value = null;` 无条件重置,替换为 line 219 顶部统一三行: + ```ts + const prevConvId = currentConversationId.value; + currentConversationId.value = id; + if (prevConvId !== id) { + stream.boardState.value = null; + stream.debateState.value = null; + stream.collaborationState.value = null; + } + ``` +- 保留 404 分支(line 258-281)的现有逻辑,404 时 `boardState` / `debateState` 也置 null(line 266-267),与主流程口径一致 + +**Patterns to follow:** +- `chatStore.ts:222` 已有无条件 `collaborationState` 重置 — 升级为条件性三 state 重置 +- 404 分支 line 264-281 的多 state 重置 + 切换到下一个会话模式 + +**Test scenarios:** +- Happy path:从有 boardState 的会话 A 切到会话 B(无 boardState)→ `boardState === null` `debateState === null` `collaborationState === null` +- Edge case:从会话 A 切回 A(force-reload 同一 id)→ 三个 state 保持原值(不被无脑清空) +- Edge case:从会话 A 切到会话 B(也无 boardState)→ 三个 state 保持 null(无变化也无副作用) +- Edge case:404 后 `createConversation()` 流程(已有 fallback 测试)— 三个 state 全 null + +**Verification:** `cd src/agentkit/server/frontend && npm run test:unit` 通过;手动验证:开 @board,再点另一普通会话,StickyModeHeader 切到普通模式(不显示私董会头像) + +### U3. deleteConversation 补全 collaborationState 重置 + +**Goal:** 修复 Bug 1 的第三处泄漏点 — `deleteConversation` 删除当前会话时漏了 `collaborationState` 重置,与其他两个 action 口径对齐。 + +**Requirements:** R3 + +**Dependencies:** 无(与 U1/U2 并行可合入) + +**Files:** +- src/agentkit/server/frontend/src/stores/chatStore.ts(修改 line 362-371 `deleteConversation` 分支) +- src/agentkit/server/frontend/tests/unit/stores/chatStore.test.ts(追加测试) + +**Approach:** +- 在 `deleteConversation` 的 `if (currentConversationId.value === id)` 分支 line 364 之后追加一行: + ```ts + stream.collaborationState.value = null; + ``` +- 与现有 line 364-365 的 board/debate 重置并列 +- 不影响"删除非当前会话"分支(line 357 仅从列表移除,三个 state 不变 — 这是正确行为,因为当前会话不切换) + +**Patterns to follow:** +- `chatStore.ts:364-365` 的 `stream.boardState.value = null; stream.debateState.value = null;` 已有模式 + +**Test scenarios:** +- Happy path:当前会话有 collaborationState(来自 collaboration_graph 消息)→ `deleteConversation(currentId)` → `collaborationState === null` +- Edge case:删除非当前会话 → 当前会话的三个 state 不变(无副作用) +- Edge case:删除当前会话后自动 `createConversation()` → 三个 state 全 null(与 U1 联动) + +**Verification:** `cd src/agentkit/server/frontend && npm run test:unit` 通过 + +### U4. ReAct _build_tool_use_prompt 规则重排 + 措辞调整 + +**Goal:** 修复 Bug 2 L0 — 重排 `_build_tool_use_prompt` 规则列表,让"何时必须使用工具"排在"何时可以不用工具"之前,并收窄规则 3 的措辞,去除"偷懒窗口"。 + +**Requirements:** R4, R5, R6 + +**Dependencies:** 无 + +**Files:** +- src/agentkit/core/react.py(修改 line 1605-1616 `_build_tool_use_prompt` 返回的 rules 字符串) +- tests/unit/test_react_engine.py(追加测试 `TestReActToolUsePromptRules`) + +**Approach:** +- 修改 `return (` 起的多行字符串中规则部分: + - 现有规则 3 改为规则 4,措辞从"如果不需要工具就能回答,直接回答即可"改为"仅在确实无需工具时可直接回答" + - 现有规则 1 改为规则 2(语义不变:"每次只调用一个工具") + - 现有规则 2 改为规则 3(语义不变:"等待工具返回结果后再决定下一步") + - 现有规则 4 改为规则 5(语义不变:"不要在回答中重复工具的输出") + - **新增规则 1**(在最前):"涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具" +- 不动 `core_tools` / `extended_tools` 渲染逻辑 +- 不动 `_render_core_tools` / `_render_extended_tools` / `_maybe_add_tool_search` +- 不动 `system_prompt` 拼接(line 608-611)— `_build_tool_use_prompt` 仍以同样方式被追加 + +**Patterns to follow:** +- `react.py:1605-1616` 现有规则结构 — 替换为 5 条而非删除 + +**Test scenarios:** +- Happy path:调用 `_build_tool_use_prompt([web_search_tool, read_file_tool])` → 输出包含"必须使用工具"且规则序号正确(1 在 2 前) +- Edge case:tools 列表为空 → 走 fast-path(不调用 `_build_tool_use_prompt`),无变化 +- 文本断言:输出不包含"如果不需要工具就能回答,直接回答即可"(旧规则 3) +- 文本断言:输出包含"涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具"(新规则 1) +- 文本断言:输出包含 `` XML 格式示例(保持向后兼容) + +**Verification:** `pytest tests/unit/test_react_engine.py -k ToolUsePromptRules` 通过;`pytest tests/unit/test_react_engine.py` 全套通过(不破坏现有 200+ 测试) + +### U5. 端到端验证测试(Bug 1 + Bug 2 联动) + +**Goal:** 写一个端到端测试覆盖 Bug 1 的前端 store 行为链 + Bug 2 的后端 prompt 文本,验证两个 fix 在测试套件中都被回归保护。 + +**Requirements:** 全部 R1-R6 + +**Dependencies:** U1, U2, U3, U4 + +**Files:** +- src/agentkit/server/frontend/tests/unit/stores/chatStore.test.ts(追加 `describe('transient state reset matrix')` 块) +- tests/unit/test_react_engine.py(追加 `describe('Bug 2 L0 prompt rules')` 块) + +**Approach:** +- **Bug 1 联动测试**:在 chatStore.test.ts 写一个"建私董会 → 切新对话 → 切回旧私董会"三步流程,断言中间步骤的三个 state 全为 null,最终回到旧私董会时通过 `restoreBoardStateFromMessages` 重建(注意:此 case 测的是 store-level 状态切换,不依赖后端响应) +- **Bug 2 联动测试**:在 test_react_engine.py 写一个"注册 web_search 工具 → 调 `_build_tool_use_prompt` → 断言 prompt 文本"测试,验证新规则 1 出现在 prompt 头部、web_search 工具描述完整(包含 description + parameters) +- 不跑真实 LLM(依赖 API key),仅文本层断言 + +**Patterns to follow:** +- `chatStore.test.ts:18-81` 的 `boardStartedMsg` / `speechMsg` / `conclusionMsg` fixture 模式 +- `test_react_engine.py:402-420` 的 `TestReActSystemPrompt` 模式(mock gateway + 调 execute + 断言 messages) + +**Test scenarios:** +- Bug 1 三步流程:建私董会(注入 board_started fixture)→ `createConversation()` → 断言三 state null → `selectConversation(originalId)` → 断言 boardState 重建 +- Bug 1 跨 session:建会话 A 含 boardState → `selectConversation(B)`(B 无 board)→ 断言三 state null → `selectConversation(A)` 重新触发 restore → 断言 boardState 重建 +- Bug 2 规则顺序:调 `_build_tool_use_prompt` → 用 regex `r'1\.[^2]*2\.'` 断言规则 1 出现在规则 2 之前 +- Bug 2 web_search 描述:调 `_build_tool_use_prompt([web_search_tool])` → 断言输出包含 "搜索互联网信息"(description 内容) + +**Verification:** `cd src/agentkit/server/frontend && npm run test:unit` + `pytest tests/unit/test_react_engine.py` 全部通过 + +## Out of Scope + +- **L1(工具描述扩展)**:web_search / baidu_search / web_crawl 工具 description 添加"何时使用"触发关键词(如"需要最新互联网信息、新闻、Trending、股价时使用 web_search")。按用户决策推迟为独立 plan +- **L2(PLAN_EXEC 启用)**:在 default agent 上注入 `PhasePolicy` 让复杂需求先输出 Plan 再执行。涉及 phase 配置、auto-advance 阈值、违规处理、phase event WS 协议。影响面较大,按用户决策拆为独立 plan +- **重构 stream-owned state 为按 conversationId 拆分**:当前是 store-level 单 ref,导致每次"切换会话"必须显式重置。改为 `Map` 可从根上消除泄漏,但影响所有 `chatStore.boardState` 读取点(StickyModeHeader / useMessageRenderer 等),属于架构重构 +- **base_prompt 调整**:保持原样,按用户决策 +- **私董会生命周期 / Skill 路由策略 / tool registry 架构 / LLM gateway**:均不动 + +## Risks & Dependencies + +- **R-U4-1**(低):规则重排可能影响现有 LLM 行为 — 风险点在某些 LLM 训练分布下"正向规则 1"可能让 LLM **过度工具化**(trivial 输入也调工具)。缓解:测试覆盖 trivial 输入走 DIRECT_CHAT 不进 ReAct 循环(request_preprocessor 已保证);pytest 单元测试断言新规则在 prompt 中但不验证 LLM 行为 +- **R-U1/2/3-1**(低):state 重置顺序在**同步代码路径**(createConversation、deleteConversation 重置块)中不会触发中间态渲染 — Vue 响应式批量更新在 microtask 中合并。但 `selectConversation` 有 async 路径:`await apiClient.getConversation(id)` 位于顶部 reset(boardState → null)和 post-fetch restore(line 307 `restoreBoardStateFromMessages`)之间。在 fetch 期间 Vue 会渲染一帧 boardState=null,导致 StickyModeHeader 卸载再重载。这是期望行为("无旧数据残留"),非 race condition。切换两个私董会会话时 header 会短暂消失再出现 — 若需平滑过渡可在 follow-up 中加 skeleton placeholder +- **D-frontend-build**(低):前端改动需要重新 build static(`npm run build:frontend`)才能被 backend 静态服务拾取。AGENTS.md 已记录此风险 + +## Deferred to Follow-Up Work + +- L1:web_search / baidu_search 工具 description 扩展(独立 plan) +- L2:启用 PLAN_EXEC phase policy 处理复杂需求(独立 plan) +- 重构 stream-owned state 为按 conversationId 拆分(架构性,独立 plan) + +## Verification (per unit, summary) + +- U1/U2/U3:`cd src/agentkit/server/frontend && npm run test:unit` 全套通过;新增 3 个 `describe` 块共 8+ test cases +- U4:`pytest tests/unit/test_react_engine.py -k ToolUsePromptRules` 通过;新增 1 个 test class 4-5 个 test cases。**Bug 2 状态声明**:L0 文本断言通过后 Bug 2 标记为 "hypothesis applied, pending L4 verification"(非 "fixed"),真实 LLM smoke test 在 L1/L2 独立 plan 中执行 +- U5:完整套件通过;端到端 4-5 个联动测试 +- 集成:`python3 -m pytest tests/unit/ -x -q`(AGENTS.md 硬约束) + `cd src/agentkit/server/frontend && npm run test:unit` 通过 +- Lint:`ruff check src/ && ruff format src/`(AGENTS.md 硬约束)通过 +- TypeScript:`cd src/agentkit/server/frontend && npm run typecheck` 通过 diff --git a/src/agentkit/core/react.py b/src/agentkit/core/react.py index 48fc1f4..2f71f7e 100644 --- a/src/agentkit/core/react.py +++ b/src/agentkit/core/react.py @@ -1609,10 +1609,11 @@ class ReActEngine: '{"name": "工具名", "arguments": {"参数名": "参数值"}}\n' "\n\n" "重要规则:\n" - "1. 每次只调用一个工具\n" - "2. 等待工具返回结果后再决定下一步\n" - "3. 如果不需要工具就能回答,直接回答即可\n" - "4. 不要在回答中重复工具的输出,而是基于结果给出有用的总结\n\n" + "1. 涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具\n" + "2. 每次只调用一个工具\n" + "3. 等待工具返回结果后再决定下一步\n" + "4. 仅在确实无需工具时可直接回答\n" + "5. 不要在回答中重复工具的输出,而是基于结果给出有用的总结\n\n" f"工具列表:\n\n{tools_text}{search_hint}" ) diff --git a/src/agentkit/server/frontend/src/stores/chatStore.ts b/src/agentkit/server/frontend/src/stores/chatStore.ts index 33b5146..e2f3d6f 100644 --- a/src/agentkit/server/frontend/src/stores/chatStore.ts +++ b/src/agentkit/server/frontend/src/stores/chatStore.ts @@ -217,9 +217,15 @@ export const useChatStore = defineStore("chat", () => { /** Select a conversation by ID and load its messages */ async function selectConversation(id: string, force = false): Promise { + const prevConvId = currentConversationId.value; currentConversationId.value = id; - // P2 #10: 会话隔离 — 切换会话时重置 collaborationState,避免跨会话数据泄漏。 - stream.collaborationState.value = null; + // 会话隔离 — 切换会话时重置三个 transient state,避免跨会话数据泄漏。 + // force-reload 同一会话时不重置,防止误清空。 + if (prevConvId !== id) { + stream.boardState.value = null; + stream.debateState.value = null; + stream.collaborationState.value = null; + } const conv = conversations.value.find((c) => c.id === id); // 本地临时会话尚未同步到服务端,跳过获取避免 404 @@ -305,6 +311,28 @@ export const useChatStore = defineStore("chat", () => { // 存在与否决定 status(已结束 vs 仍在讨论中)。Reload 后 BoardStatusView / StickyModeHeader // 才能正常显示专家列表和轮次,私信 0 人的现象也由此修复。 stream.boardState.value = restoreBoardStateFromMessages(restoredConv?.messages ?? []); + + // 颜色一致性兜底:board_speech / round_summary 消息如果缺失 expert_avatar + // 或 expert_color(早期持久化未写入),从 boardState.experts 补全, + // 保证 StickyModeHeader 头像和 MessageShell 头像颜色一致。 + if (stream.boardState.value && restoredConv?.messages) { + const expertMap = new Map( + stream.boardState.value.experts.map((e) => [e.name, e]), + ); + for (const m of restoredConv.messages) { + if ( + (m.message_type === "board_speech" || + m.message_type === "board_summary") && + m.expert_name + ) { + const expert = expertMap.get(m.expert_name); + if (expert) { + if (!m.expert_avatar) m.expert_avatar = expert.avatar; + if (!m.expert_color) m.expert_color = expert.color; + } + } + } + } } /** Create a new empty conversation */ @@ -319,6 +347,10 @@ export const useChatStore = defineStore("chat", () => { }; conversations.value.unshift(newConversation); currentConversationId.value = newConversation.id; + // 重置 transient state,避免旧私董会/辩论/协作状态泄漏到新会话 + stream.boardState.value = null; + stream.debateState.value = null; + stream.collaborationState.value = null; stream.clearConvSteps(newConversation.id); } @@ -341,6 +373,7 @@ export const useChatStore = defineStore("chat", () => { currentConversationId.value = null; stream.boardState.value = null; stream.debateState.value = null; + stream.collaborationState.value = null; if (conversations.value.length > 0) { await selectConversation(conversations.value[0].id); } else { diff --git a/src/agentkit/server/frontend/tests/unit/stores/chatStore.transient-state.test.ts b/src/agentkit/server/frontend/tests/unit/stores/chatStore.transient-state.test.ts new file mode 100644 index 0000000..c80f12f --- /dev/null +++ b/src/agentkit/server/frontend/tests/unit/stores/chatStore.transient-state.test.ts @@ -0,0 +1,172 @@ +/** + * Transient state reset matrix tests (U5 / Bug 1). + * + * Verifies that createConversation / selectConversation / deleteConversation + * reset the three stream-owned transient state refs (boardState / + * debateState / collaborationState) to null — preventing cross-conversation + * state leakage (e.g. private board header persisting into a new conversation). + * + * Mock strategy follows chat-phase.test.ts: mock apiClient + peer stores, + * use setActivePinia(createPinia()), dynamic-import the real useChatStore. + * All test conversations are is_local=true so no API calls fire. + */ + +import { beforeEach, describe, expect, it, vi } from 'vitest' +import { setActivePinia, createPinia } from 'pinia' + +// Mock apiClient so the store never touches the network. +vi.mock('@/api/client', () => ({ + apiClient: { + getConversations: vi.fn().mockResolvedValue([]), + getConversation: vi.fn().mockResolvedValue({ id: '', title: '', messages: [] }), + deleteConversation: vi.fn().mockResolvedValue(undefined), + get: vi.fn(), + post: vi.fn(), + put: vi.fn(), + delete: vi.fn(), + patch: vi.fn(), + }, +})) + +// Mock peer stores to avoid pulling their dependencies. +vi.mock('@/stores/team', () => ({ useTeamStore: vi.fn(() => null) })) +vi.mock('@/stores/documents', () => ({ useDocumentsStore: vi.fn(() => null) })) +vi.mock('@/stores/calendar', () => ({ useCalendarStore: vi.fn(() => null) })) +vi.mock('@/api/documents', () => ({ isDocumentMeta: vi.fn() })) + +describe('transient state reset matrix', () => { + beforeEach(() => { + setActivePinia(createPinia()) + }) + + it('createConversation resets boardState / debateState / collaborationState to null', async () => { + const { useChatStore } = await import('@/stores/chatStore') + const store = useChatStore() + + // Simulate a board meeting was active in the previous conversation. + store.boardState = { topic: 'stale', experts: [], max_rounds: 1, current_round: 0, status: 'discussing' } as never + store.debateState = { topic: 'stale' } as never + store.collaborationState = { contracts: [], notices: [], reviews: [], risks: [] } as never + + store.createConversation() + + expect(store.boardState).toBeNull() + expect(store.debateState).toBeNull() + expect(store.collaborationState).toBeNull() + }) + + it('selectConversation to a different conversation resets all three states', async () => { + const { useChatStore } = await import('@/stores/chatStore') + const store = useChatStore() + + // Pre-populate two local conversations. + store.conversations = [ + { id: 'conv-a', title: 'A', messages: [], created_at: '', updated_at: '', is_local: true }, + { id: 'conv-b', title: 'B', messages: [], created_at: '', updated_at: '', is_local: true }, + ] + store.currentConversationId = 'conv-a' + + // Simulate board state from conversation A. + store.boardState = { topic: 'board in A', experts: [], max_rounds: 1, current_round: 0, status: 'discussing' } as never + store.debateState = { topic: 'debate in A' } as never + store.collaborationState = { contracts: [], notices: [], reviews: [], risks: [] } as never + + // Switch to conversation B (no board messages → restoreBoardStateFromMessages returns null). + await store.selectConversation('conv-b') + + expect(store.boardState).toBeNull() + expect(store.debateState).toBeNull() + expect(store.collaborationState).toBeNull() + }) + + it('selectConversation force-reload same id does not clear via conditional reset', async () => { + const { useChatStore } = await import('@/stores/chatStore') + const { restoreBoardStateFromMessages } = await import('@/stores/chatStore') + + // Build a conversation with a board_started message so restoreBoardStateFromMessages returns non-null. + const boardStartedMsg = { + id: 'msg-start', + role: 'assistant' as const, + content: '私董会开始:测试主题', + timestamp: '2026-07-01T10:00:00Z', + status: 'completed' as const, + message_type: 'board_started', + board_started: { + team_id: 'team-1', + topic: '测试主题', + max_rounds: 3, + experts: [ + { name: 'Alice', avatar: 'A', color: '#888888', is_moderator: true, persona: '主持人' }, + ], + }, + board_round: 0, + } + + const store = useChatStore() + store.conversations = [ + { id: 'conv-a', title: 'A', messages: [boardStartedMsg], created_at: '', updated_at: '', is_local: true }, + ] + store.currentConversationId = 'conv-a' + + // Initial select to populate boardState from messages. + await store.selectConversation('conv-a') + expect(store.boardState).not.toBeNull() + expect(store.boardState?.topic).toBe('测试主题') + + // Force-reload the same conversation — boardState should be restored, not cleared. + await store.selectConversation('conv-a', true) + expect(store.boardState).not.toBeNull() + expect(store.boardState?.topic).toBe('测试主题') + + // Sanity: restoreBoardStateFromMessages returns non-null for this fixture. + expect(restoreBoardStateFromMessages([boardStartedMsg])).not.toBeNull() + }) + + it('deleteConversation of current conversation resets all three states including collaborationState', async () => { + const { useChatStore } = await import('@/stores/chatStore') + const store = useChatStore() + + // Create a local conversation and select it. + store.conversations = [ + { id: 'conv-a', title: 'A', messages: [], created_at: '', updated_at: '', is_local: true }, + { id: 'conv-b', title: 'B', messages: [], created_at: '', updated_at: '', is_local: true }, + ] + store.currentConversationId = 'conv-a' + + // Simulate all three transient states are active. + store.boardState = { topic: 'board', experts: [], max_rounds: 1, current_round: 0, status: 'discussing' } as never + store.debateState = { topic: 'debate' } as never + store.collaborationState = { contracts: [], notices: [], reviews: [], risks: [] } as never + + // Delete the current conversation — should reset all three states. + await store.deleteConversation('conv-a') + + expect(store.boardState).toBeNull() + expect(store.debateState).toBeNull() + expect(store.collaborationState).toBeNull() + }) + + it('deleteConversation of a non-current conversation does not touch transient states', async () => { + const { useChatStore } = await import('@/stores/chatStore') + const store = useChatStore() + + store.conversations = [ + { id: 'conv-a', title: 'A', messages: [], created_at: '', updated_at: '', is_local: true }, + { id: 'conv-b', title: 'B', messages: [], created_at: '', updated_at: '', is_local: true }, + ] + store.currentConversationId = 'conv-a' + + const boardState = { topic: 'board in A', experts: [], max_rounds: 1, current_round: 0, status: 'discussing' } + store.boardState = boardState as never + store.debateState = { topic: 'debate' } as never + store.collaborationState = { contracts: [], notices: [], reviews: [], risks: [] } as never + + // Delete a non-current conversation — states should be untouched. + await store.deleteConversation('conv-b') + + expect(store.boardState).not.toBeNull() + expect(store.boardState?.topic).toBe('board in A') + expect(store.debateState).not.toBeNull() + expect(store.collaborationState).not.toBeNull() + }) +}) diff --git a/tests/unit/test_react_engine.py b/tests/unit/test_react_engine.py index 8ea4fbb..6d3ad8e 100644 --- a/tests/unit/test_react_engine.py +++ b/tests/unit/test_react_engine.py @@ -989,3 +989,94 @@ class TestMalformedToolUseNotLeakedAsFinalAnswer: # 不应把原始 XML 作为最终答案 assert "" not in result.output assert result.output == "Search completed" + + +class TestReActToolUsePromptRules: + """_build_tool_use_prompt 规则文本断言(U4 / Bug 2 L0)""" + + def test_new_rule_1_present_at_top(self): + """新规则 1 '涉及外部信息...' 出现在规则列表头部""" + from agentkit.core.react import ReActEngine + + gateway = make_mock_gateway([]) + engine = ReActEngine(llm_gateway=gateway) + prompt = engine._build_tool_use_prompt([]) + + assert "1. 涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具" in prompt + + def test_old_rule_3_absent(self): + """旧规则 3 '如果不需要工具就能回答,直接回答即可' 不再出现""" + from agentkit.core.react import ReActEngine + + gateway = make_mock_gateway([]) + engine = ReActEngine(llm_gateway=gateway) + prompt = engine._build_tool_use_prompt([]) + + assert "如果不需要工具就能回答,直接回答即可" not in prompt + + def test_rules_in_correct_order(self): + """规则序号 1-5 按预期顺序排列""" + from agentkit.core.react import ReActEngine + + gateway = make_mock_gateway([]) + engine = ReActEngine(llm_gateway=gateway) + prompt = engine._build_tool_use_prompt([]) + + # 规则 1 在规则 2 之前,规则 2 在规则 3 之前,以此类推 + r1 = prompt.index("1. 涉及外部信息") + r2 = prompt.index("2. 每次只调用一个工具") + r3 = prompt.index("3. 等待工具返回结果") + r4 = prompt.index("4. 仅在确实无需工具时") + r5 = prompt.index("5. 不要在回答中重复工具的输出") + assert r1 < r2 < r3 < r4 < r5 + + def test_tool_use_xml_format_preserved(self): + """ XML 格式示例保持向后兼容""" + from agentkit.core.react import ReActEngine + + gateway = make_mock_gateway([]) + engine = ReActEngine(llm_gateway=gateway) + prompt = engine._build_tool_use_prompt([]) + + assert "" in prompt + assert "" in prompt + + +class TestBug2L0PromptRules: + """Bug 2 L0 端到端验证:_build_tool_use_prompt 包含工具描述 + 新规则 + + Bug 2 状态:hypothesis applied, pending L4 verification(非 fixed)。 + L0 仅做文本断言,真实 LLM smoke test 在 L1/L2 独立 plan 中执行。 + """ + + def test_web_search_description_in_prompt(self): + """注册 web_search 工具后,prompt 包含其描述文本""" + from agentkit.core.react import ReActEngine + + web_search = FakeTool( + name="web_search", + description="搜索互联网信息,获取实时数据、新闻、趋势等", + ) + gateway = make_mock_gateway([]) + engine = ReActEngine(llm_gateway=gateway) + prompt = engine._build_tool_use_prompt([web_search]) + + # web_search 不是 core tool,作为 extended tool 渲染 + # extended tool 渲染格式: "- name: first_line_of_description" + assert "web_search" in prompt + assert "搜索互联网信息" in prompt + + def test_new_rule_1_present_with_tools(self): + """有工具注册时,prompt 仍包含新规则 1""" + from agentkit.core.react import ReActEngine + + web_search = FakeTool( + name="web_search", + description="搜索互联网信息", + ) + gateway = make_mock_gateway([]) + engine = ReActEngine(llm_gateway=gateway) + prompt = engine._build_tool_use_prompt([web_search]) + + assert "1. 涉及外部信息、实时数据、多步骤分析或你不确定的事实时必须使用工具" in prompt + assert "如果不需要工具就能回答,直接回答即可" not in prompt