refactor: systematic tech debt cleanup (U1-U5) #8

Merged
fischer merged 7 commits from refactor/react-engine-unified-loop into main 2026-07-01 00:45:35 +08:00
Owner

Summary

系统性清理综合评审(3.78/5)识别的 5 项技术债:ReActEngine 流式/非流式 ~800 行重复、TeamOrchestrator 2080 行上帝类、except Exception 345+ 处滥用(关键路径降级静默通过)、Any 类型残留(bitable/ 等)、前端 chat.ts 2025 行巨型文件。重构在 5989 单测+35 新增 vitest 保障下完成,无新回归,关键路径测试全过。

What Changed

Unit 模块 变更 收益
U1 core/react.py _execute_loop 改为 async generator,execute/execute_stream 共用骨架 消除 ~800 行重复,行为等价有 golden trajectory 快照锁定
U2 experts/orchestrator.py + 7 个 mixin 2080 行上帝类拆为 PhaseExecutor/DebateRunner/ReviewGate/DivergenceDetector/RollbackHandler/Synthesizer/InterventionHandler 主类 1576→~440 行,单方法 ≤100 行
U3 core/+experts/ except 治理 关键路径 except Exception 分类收窄,新增 ReviewResult dataclass 替代裸 tuple + [DEGRADED] 字符串前缀 验收降级可编程判断,review_result WS 事件携带 degraded 字段
U4 bitable/+pipeline_state.py+tools/computer_use_session.py AnyTypeAlias(BitableRecord/FormulaResult/SessionState) + object + TYPE_CHECKING Protocol 消除 40 处 Any,恢复类型契约
U5 server/frontend/src/stores/chat*.ts 2025 行 chat.ts 拆为 chatStore(498)/chatSocket(165)/chatStream(1557) + 35 新增 vitest dispatchWsEvent 纯函数覆盖 30+ WS 事件类型,可独立测试

Key Design Decisions

  1. U1 async generator 统一骨架(KTD1):_execute_loop 始终 yield ReActEventexecute 收集所有事件提取 ReActResultexecute_stream 直接 async for 透传。无需 callback/queue 桥接,最简。新增 'final_result' event_type 字符串值携带 ReActResult,不新建枚举。

  2. U2 Mixin 而非组合(KTD2):37 个方法大量访问 self._experts/self._workspace/self._broadcast_event 共享状态,Mixin 保持 self 访问最小改动。每个 mixin 文件顶部注明 # TYPE_CHECKING: 由 TeamOrchestrator 组合

  3. U3 结构化降级(KTD3):ReviewResult(passed, degraded, feedback) 替代 (bool, str) tuple + [DEGRADED] 字符串前缀。_review_phase_outputexcept 收窄为 (LLMProviderError, asyncio.TimeoutError, ConnectionError, RuntimeError),降级返回 passed=True, degraded=True_phase_executor 广播 review_result 事件时携带 degraded 字段。

  4. U4 object + Protocol 模式(KTD4):循环依赖场景用 object(最严格"任意类型",禁止属性访问)+ TYPE_CHECKING 块定义 Protocol(_RedisLike/_RecalcWorker)。可直接导入的类型用 TypeAlias。

  5. U5 按职责层拆分(KTD5):chatSocket(连接/心跳/重连)、chatStreamdispatchWsEvent 纯函数 + 30+ 事件分发)、chatStore(Pinia store 组合 composables)。chatStore 保留向后兼容 export 别名。

Test Plan

新增测试

  • tests/unit/test_react_golden_trajectory.py (617 行) — U1 golden trajectory 快照,锁定 execute/execute_stream 行为等价
  • tests/unit/server/frontend/tests/unit/stores/chatStream.test.ts (563 行, 19 tests) — U5 dispatchWsEvent 全事件类型覆盖
  • tests/unit/server/frontend/tests/unit/stores/chatSocket.test.ts (255 行, 13 tests) — U5 useChatSocket composable + resolveIncomingConvId

回归验证(与 main 基线对比):

  • core+experts+evolution+memory: 894 全过
  • react+team_orchestrator+golden: 190 全过
  • bitable+computer_use+pipeline+orchestrator: 202 过 116 skip
  • auth+admin+chat+cli+mcp+quality+router+skills: 1025 过 / 2 基线失败 (router test_intent)
  • server: 215 过 / 110 失败 — 与 main 基线完全一致 (110 失败,无新回归)
  • ruff: 67 errors (基线 77,净减 10)
  • frontend typecheck: 通过
  • frontend vitest: 68/69 (1 基线失败 tauri-auth localStorage 状态泄漏)

环境约束:Python 3.14 + litellm 缺失,跳过 test_cache.py

Post-Deploy Monitoring & Validation

Log queries:

  • review_result WS 事件中 degraded: true 计数 — 验收降级频率监控
  • phase_violation 事件计数 — phase 策略执行情况
  • ReActEngine final_result event status 分布 (success/timeout/cancelled/empty_fallback)

Metrics:

  • TeamOrchestrator 阶段执行耗时 (phase_started → phase_completed)
  • ReActEngine 步数分布 (max_steps 达到频率)
  • chatSocket 重连频率 (WebSocket disconnect 事件)

Healthy signals:

  • degraded: true 比例 < 5% (LLM 验收正常)
  • final_result.status=success 比例 > 80%
  • 阶段失败率 < 10%

Failure signals:

  • degraded: true 比例突增 → LLM gateway 可用性问题
  • final_result.status=empty_fallback 突增 → LLM 调用失败
  • chatSocket 重连频率突增 → WebSocket 稳定性问题

Rollback trigger: 任一指标超过阈值 2 倍持续 15 分钟 → 回滚到 main (0962df1)

Validation window: 合并后 24h,owner: @chiguyong

Known Residuals

  • chatStream.ts 1557 行 vs plan 估计 ~300 行(30+ WS 事件类型决定,plan Verification 只要求 chatStore ≤500,已满足)
  • U3 plan 要求的"验收 LLM 超时"独立测试未单独新增(现有 test_synthesize_without_llm_concatenates 间接覆盖 RuntimeError 降级路径)
  • server/routes/except Exception 整治(portal.py 19、chat.py 16)deferred to follow-up PR
  • llm//memory//client/Any 残留治理 deferred to follow-up PR
  • bitable/ 内部 Any 残留(repository.py 5、recalc_worker.py 2、ingestion/* 3,共 10 处)deferred to follow-up PR

Compound Engineering
HARNESS

## Summary 系统性清理综合评审(3.78/5)识别的 5 项技术债:ReActEngine 流式/非流式 ~800 行重复、TeamOrchestrator 2080 行上帝类、`except Exception` 345+ 处滥用(关键路径降级静默通过)、`Any` 类型残留(bitable/ 等)、前端 chat.ts 2025 行巨型文件。重构在 5989 单测+35 新增 vitest 保障下完成,无新回归,关键路径测试全过。 ## What Changed | Unit | 模块 | 变更 | 收益 | |---|---|---|---| | U1 | `core/react.py` | `_execute_loop` 改为 async generator,`execute`/`execute_stream` 共用骨架 | 消除 ~800 行重复,行为等价有 golden trajectory 快照锁定 | | U2 | `experts/orchestrator.py` + 7 个 mixin | 2080 行上帝类拆为 `PhaseExecutor`/`DebateRunner`/`ReviewGate`/`DivergenceDetector`/`RollbackHandler`/`Synthesizer`/`InterventionHandler` | 主类 1576→~440 行,单方法 ≤100 行 | | U3 | `core/`+`experts/` except 治理 | 关键路径 `except Exception` 分类收窄,新增 `ReviewResult` dataclass 替代裸 tuple + `[DEGRADED]` 字符串前缀 | 验收降级可编程判断,`review_result` WS 事件携带 `degraded` 字段 | | U4 | `bitable/`+`pipeline_state.py`+`tools/computer_use_session.py` | `Any` → `TypeAlias`(`BitableRecord`/`FormulaResult`/`SessionState`) + `object` + `TYPE_CHECKING` Protocol | 消除 40 处 `Any`,恢复类型契约 | | U5 | `server/frontend/src/stores/chat*.ts` | 2025 行 chat.ts 拆为 `chatStore`(498)/`chatSocket`(165)/`chatStream`(1557) + 35 新增 vitest | `dispatchWsEvent` 纯函数覆盖 30+ WS 事件类型,可独立测试 | ## Key Design Decisions 1. **U1 async generator 统一骨架**(KTD1):`_execute_loop` 始终 `yield ReActEvent`,`execute` 收集所有事件提取 `ReActResult`,`execute_stream` 直接 `async for` 透传。无需 callback/queue 桥接,最简。新增 `'final_result'` event_type 字符串值携带 `ReActResult`,不新建枚举。 2. **U2 Mixin 而非组合**(KTD2):37 个方法大量访问 `self._experts`/`self._workspace`/`self._broadcast_event` 共享状态,Mixin 保持 `self` 访问最小改动。每个 mixin 文件顶部注明 `# TYPE_CHECKING: 由 TeamOrchestrator 组合`。 3. **U3 结构化降级**(KTD3):`ReviewResult(passed, degraded, feedback)` 替代 `(bool, str)` tuple + `[DEGRADED]` 字符串前缀。`_review_phase_output` 的 `except` 收窄为 `(LLMProviderError, asyncio.TimeoutError, ConnectionError, RuntimeError)`,降级返回 `passed=True, degraded=True`。`_phase_executor` 广播 `review_result` 事件时携带 `degraded` 字段。 4. **U4 `object` + Protocol 模式**(KTD4):循环依赖场景用 `object`(最严格"任意类型",禁止属性访问)+ `TYPE_CHECKING` 块定义 Protocol(`_RedisLike`/`_RecalcWorker`)。可直接导入的类型用 TypeAlias。 5. **U5 按职责层拆分**(KTD5):`chatSocket`(连接/心跳/重连)、`chatStream`(`dispatchWsEvent` 纯函数 + 30+ 事件分发)、`chatStore`(Pinia store 组合 composables)。`chatStore` 保留向后兼容 export 别名。 ## Test Plan **新增测试**: - `tests/unit/test_react_golden_trajectory.py` (617 行) — U1 golden trajectory 快照,锁定 execute/execute_stream 行为等价 - `tests/unit/server/frontend/tests/unit/stores/chatStream.test.ts` (563 行, 19 tests) — U5 dispatchWsEvent 全事件类型覆盖 - `tests/unit/server/frontend/tests/unit/stores/chatSocket.test.ts` (255 行, 13 tests) — U5 useChatSocket composable + resolveIncomingConvId **回归验证**(与 main 基线对比): - core+experts+evolution+memory: 894 全过 - react+team_orchestrator+golden: 190 全过 - bitable+computer_use+pipeline+orchestrator: 202 过 116 skip - auth+admin+chat+cli+mcp+quality+router+skills: 1025 过 / 2 基线失败 (router test_intent) - server: 215 过 / 110 失败 — **与 main 基线完全一致** (110 失败,无新回归) - ruff: 67 errors (基线 77,净减 10) - frontend typecheck: 通过 - frontend vitest: 68/69 (1 基线失败 tauri-auth localStorage 状态泄漏) **环境约束**:Python 3.14 + litellm 缺失,跳过 `test_cache.py`。 ## Post-Deploy Monitoring & Validation **Log queries**: - `review_result` WS 事件中 `degraded: true` 计数 — 验收降级频率监控 - `phase_violation` 事件计数 — phase 策略执行情况 - ReActEngine `final_result` event status 分布 (success/timeout/cancelled/empty_fallback) **Metrics**: - TeamOrchestrator 阶段执行耗时 (phase_started → phase_completed) - ReActEngine 步数分布 (max_steps 达到频率) - chatSocket 重连频率 (WebSocket disconnect 事件) **Healthy signals**: - `degraded: true` 比例 < 5% (LLM 验收正常) - `final_result.status=success` 比例 > 80% - 阶段失败率 < 10% **Failure signals**: - `degraded: true` 比例突增 → LLM gateway 可用性问题 - `final_result.status=empty_fallback` 突增 → LLM 调用失败 - chatSocket 重连频率突增 → WebSocket 稳定性问题 **Rollback trigger**: 任一指标超过阈值 2 倍持续 15 分钟 → 回滚到 main (0962df1) **Validation window**: 合并后 24h,owner: @chiguyong ## Known Residuals - `chatStream.ts` 1557 行 vs plan 估计 ~300 行(30+ WS 事件类型决定,plan Verification 只要求 chatStore ≤500,已满足) - U3 plan 要求的"验收 LLM 超时"独立测试未单独新增(现有 `test_synthesize_without_llm_concatenates` 间接覆盖 RuntimeError 降级路径) - `server/routes/` 的 `except Exception` 整治(portal.py 19、chat.py 16)deferred to follow-up PR - `llm/`/`memory/`/`client/` 的 `Any` 残留治理 deferred to follow-up PR - bitable/ 内部 `Any` 残留(repository.py 5、recalc_worker.py 2、ingestion/* 3,共 10 处)deferred to follow-up PR --- [![Compound Engineering](https://img.shields.io/badge/Built_with-Compound_Engineering-6366f1)](https://github.com/EveryInc/compound-engineering-plugin) ![HARNESS](https://img.shields.io/badge/GLM_5.2-D97757?logo=claude&logoColor=white)
fischer added 7 commits 2026-07-01 00:43:19 +08:00
a3cecd4b50 fix(review): apply P0/P2 findings from dual-agent review
- Dockerfile: split ENTRYPOINT/CMD to align with docker-compose serve
- test_termbase: guard jieba import with pytest.importorskip
- orchestrator: mark silent review-degradation with [DEGRADED] prefix
- chat.py: accurate ExecutionMode log message
- agentkit.yaml: document OTel exporter config
- skill_routing: replace 12 Any with object/typed (AGENTS.md compliance)
- AssistantText.vue: add aria-live/role for a11y
e61f98898f refactor(core): unify ReActEngine execute/execute_stream via async generator (U1)
- Convert _execute_loop to async generator yielding ReActEvent; both execute and execute_stream delegate to it, eliminating ~760 lines of duplicated loop logic (execute_stream 813 -> 53 lines).

- Add 'final_result' event_type carrying ReActResult; execute extracts result from final event, execute_stream forwards events (backward-compatible 'final_answer' retained).

- Unify _drain_phase_violations across both paths.

- Add 14 golden-trajectory characterization tests.

- Fix test_execute_stream_with_compressor mock gateway (chat_stream test-infra gap). 130 react tests pass, 762 core+experts pass, no regressions.
47ee2449df refactor(experts): split TeamOrchestrator god class into 7 mixins (U2)
- Split 2085-line orchestrator.py into main class (592 lines) + 7 responsibility-focused mixins: PhaseExecutor, DebateRunner, ReviewGate, DivergenceDetector, RollbackHandler, Synthesizer, InterventionHandler.

- Mixin pattern preserves self access to shared state (_experts/_workspace/_broadcast_event); method bodies moved verbatim to minimize regression risk. Each mixin declares TYPE_CHECKING Protocol for shared state.

- Split _execute_execution_phase (~290 lines) into _prepare_phase_context/_run_agent_steps/_finalize_phase (each <=100 lines).

- All mixins <=400 lines, main class <=600 lines. [DEGRADED] prefix annotations preserved in ReviewGateMixin.

- 60 team_orchestrator tests pass (behavior unchanged), 469 experts tests pass, ruff clean.
be5c4e09f8 refactor(core,experts): classify except Exception + structured ReviewResult (U3)
ReviewResult dataclass (passed/degraded/feedback) replaces tuple+[DEGRADED] prefix in _review_phase_output; 3 review_result WS payloads now carry degraded field (AE3).

except Exception narrowed to specific types across 10 files (core/react, rewoo, base, orchestrator, dispatcher, plan_exec_engine + experts/orchestrator, _phase_executor, _review_gate + orchestrator/pipeline_engine). Baseline 140 -> 66 occurrences (>=50% reduction).

Fix RuntimeError regression: review-gate + compression paths now catch RuntimeError (LLM/provider internal errors) to preserve degradation semantics. Test side_effect switched to functional form to avoid StopIteration on list exhaustion.

ruff clean; 135 key + 469 experts + 163 core tests pass.
1033346913 refactor(bitable,tools): replace Any with concrete types + Protocol (U4)
BitableRecord/FormulaResult/SessionState TypeAlias replace dict[str, Any]; _redis/_engine/_session_factory typed as object | None with TYPE_CHECKING Protocol (_RedisLike, _RecalcWorker); Coroutine[Any, Any, Any] retained as legitimate type param.

Baseline 40 : Any occurrences -> 0 across 6 in-scope files (target <=5). Deferred: repository.py/recalc_worker.py/ingestion/* (10 occurrences, separate PR).

ruff clean; 367 passed + 116 skipped (bitable + pipeline_state + tools).
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
ec9a0a1f70
refactor(frontend): split chat.ts (2025 lines) into chatStore/chatSocket/chatStream (U5)
chatStore.ts (498 lines, <=500 target met): Pinia store entry composing useChatSocket + useChatStream; retains all actions + backward-compat export aliases.

chatSocket.ts (165 lines): resolveIncomingConvId pure fn + useChatSocket composable (connect/disconnect/heartbeat/reconnect).

chatStream.ts (1557 lines): dispatchWsEvent pure fn for 30+ WS event types + useChatStream composable. Exceeds plan ~300 estimate due to discriminated union breadth (each case 30-50 lines); core testability goal met.

8 components + chat-phase.test.ts migrated from @/stores/chat to @/stores/chatStore.

vitest: 35 new tests (chatStream 19 + chatSocket 13 + chat-phase 3) all green; typecheck passes.
fischer merged commit cc531d0663 into main 2026-07-01 00:45:35 +08:00
Sign in to join this conversation.
No reviewers
No Label
No Milestone
No project
No Assignees
1 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: fischer/fischer-agentkit#8
No description provided.