docs(plan): Wave 3 strategic coupling plan (G5/G6)
This commit is contained in:
parent
a2dcde01b8
commit
e3f69f963c
|
|
@ -0,0 +1,436 @@
|
||||||
|
---
|
||||||
|
title: "feat: Agent Wave 3 strategic coupling (G5/G6)"
|
||||||
|
date: 2026-06-29
|
||||||
|
type: feat
|
||||||
|
status: draft
|
||||||
|
origin: docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md
|
||||||
|
execution: code
|
||||||
|
---
|
||||||
|
|
||||||
|
# Wave 3 Strategic Coupling — G5 Function-level Sharding + G6 SOLO Phase Constraints
|
||||||
|
|
||||||
|
## Summary
|
||||||
|
|
||||||
|
Wave 3 of the advanced-agent gap optimization closes two strategic gaps deferred from Waves 1-2:
|
||||||
|
|
||||||
|
- **G5 — Function-level code sharding** (R22, R23): file reading gains an optional `symbol` parameter for symbol/function-granularity slicing, backward compatible with full-file reads.
|
||||||
|
- **G6 — SOLO four-stage state machine** (R24, R25): ReAct loop enforces per-phase tool whitelists (Planning → Building → Verification → Delivery). Extends existing `ExecutionMode.PLAN_EXEC` rather than introducing a new mode.
|
||||||
|
|
||||||
|
Wave 1 (G1/G2/G3/G8 — PR #4 merged) and Wave 2 (G4/G7/G9 — PR #5 open) shipped independently. Wave 3 is the **strategic-risk** wave: it introduces a new tool (G5) and touches ReAct core (G6). Per the brainstorm's KTD6/KTD7 locked decisions, G5 integration approach is decided here (not deferred further), and G6 extends PLAN_EXEC rather than adding a new mode.
|
||||||
|
|
||||||
|
## Problem Frame
|
||||||
|
|
||||||
|
The brainstorm (`docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md`) identified nine gaps across three dimensions. Waves 1-2 closed seven (G1-G4, G7-G9). The remaining two gaps are strategic:
|
||||||
|
|
||||||
|
- **G5 (Long-context cost)**: Large files (e.g. 5000-line modules) blow context budget when the agent only needs one function. Today the agent shells out to `cat file.py` or greps; both pull the whole file into context. A symbol-aware slice would cut context cost 10-50x for typical edits.
|
||||||
|
- **G6 (Unsafe tool sequencing)**: ReAct loop lets the LLM call `write_file` during early exploration before committing to a plan. This wastes tokens on premature edits, causes half-baked refactors, and breaks the "plan-then-build" discipline that production agents (Qoder, Trae Work) enforce via phase state machines.
|
||||||
|
|
||||||
|
KTD6 (brainstorm) defers the G5 integration decision to this plan. KTD7 locks G6 to extend PLAN_EXEC rather than introduce a new mode.
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
Carried forward from the brainstorm, Wave 3 section:
|
||||||
|
|
||||||
|
- **R22**: File reading supports symbol/function granularity sharding.
|
||||||
|
- **R23**: Sharding capability exposed as a tool parameter (`symbol="function_name"`), backward compatible with full-file reads.
|
||||||
|
- **R24**: ReAct loop enforces phase constraints — Planning phase only allows `think`/`search`; Building phase only allows `write_file` (and similar write tools).
|
||||||
|
- **R25**: Phase state is configurable; extends `ExecutionMode.PLAN_EXEC`, does NOT introduce a new mode.
|
||||||
|
|
||||||
|
Cross-cutting (brainstorm):
|
||||||
|
|
||||||
|
- **R26**: All optimizations configurable via `agentkit.yaml` (follow `ServerConfig.from_dict` pattern established in Waves 1-2).
|
||||||
|
- **R27**: Each optimization ships a minimal self-check test (ponytail rule).
|
||||||
|
|
||||||
|
Acceptance examples relevant to Wave 3:
|
||||||
|
|
||||||
|
- **AE5 already covered by Wave 2 (G9)** — not in Wave 3 scope.
|
||||||
|
- No explicit AE for G5/G6 in the brainstorm — the plan below specifies test scenarios as the acceptance contract.
|
||||||
|
|
||||||
|
## Key Technical Decisions
|
||||||
|
|
||||||
|
### KTD1: G5 uses Python `ast` + language-aware regex, NOT tree-sitter
|
||||||
|
|
||||||
|
**Decision**: Implement symbol extraction with the Python stdlib `ast` module for Python files, and a small regex-based extractor for TypeScript/JavaScript/Go/Rust/Java. No new dependency.
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- `tree-sitter` requires native compilation + per-language grammar files (~30MB installed) — violates the ponytail rule "no new dependency if it can be avoided" and AGENTS.md "禁止使用 any 类型" → prefers minimal stack.
|
||||||
|
- `ast` is stdlib, always available, parses Python accurately.
|
||||||
|
- Regex extractor covers 80% case for TS/JS/Go/Rust/Java (function/class/struct declarations); falls back to "no symbols found → read full file" gracefully.
|
||||||
|
- If a future Wave 4 needs more accurate multi-language parsing, `tree-sitter` can replace the regex layer behind the same `SymbolExtractor` interface.
|
||||||
|
|
||||||
|
**Upgrade path**: replace `RegexSymbolExtractor` with `TreeSitterSymbolExtractor` implementing the same `SymbolExtractor` protocol; no caller changes.
|
||||||
|
|
||||||
|
### KTD2: G5 adds a new `ReadFileTool`, does not extend `ShellTool`
|
||||||
|
|
||||||
|
**Decision**: Add a dedicated `ReadFileTool` in `src/agentkit/tools/file_read.py` with `path` + optional `symbol` + optional `start_line`/`end_line` parameters.
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- `ShellTool` is for shell execution; grafting symbol extraction onto it muddies the contract.
|
||||||
|
- A dedicated tool gives the LLM a clear schema (`{"path": "...", "symbol": "function_name"}`) and a focused system-prompt description.
|
||||||
|
- Aligns with the existing `_DEFAULT_CORE_TOOLS` list in `core/react.py:148` which already references `read_file` — the name is reserved but the implementation is missing.
|
||||||
|
|
||||||
|
### KTD3: G6 phase state machine lives in ReActEngine, not in the skill config
|
||||||
|
|
||||||
|
**Decision**: Phase state (Planning/Building/Verification/Delivery) is tracked as a mutable field on `ReActEngine` instance. Transitions are driven by LLM-detected phase-completion signals (e.g., the LLM emits `Phase: Building` in its thinking) OR by an explicit `advance_phase` tool.
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- Skill config declares the policy (which tools per phase, auto-advance vs manual); the engine enforces it per-step. This matches R24 ("ReAct 循环加阶段约束").
|
||||||
|
- Alternative considered: phase state in the agent instance (not engine). Rejected because ReActEngine already owns `max_steps`/`verification_enabled` etc.; phase state belongs with the loop that enforces it.
|
||||||
|
|
||||||
|
### KTD4: PLAN_EXEC mode is wired at chat.py WebSocket path (REST already has fallback chain from Wave 2)
|
||||||
|
|
||||||
|
**Decision**: chat.py:1084 (currently warns "not yet supported, falling back to REACT") will route `ExecutionMode.PLAN_EXEC` to a new `_execute_plan_exec_ws` handler that constructs `PhasePolicy` from `ServerConfig.plan_exec` and passes it to `ReActEngine.execute`.
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- REST `send_message` already uses the Wave 2 three-tier fallback chain; PLAN_EXEC at REST would also need that wrapper. **Out of scope for Wave 3** — only WebSocket path is wired. REST PLAN_EXEC remains "not yet supported" and explicitly raises if invoked.
|
||||||
|
- Single integration point keeps Wave 3 scope bounded; REST wiring is a one-line follow-up once WebSocket path is proven.
|
||||||
|
|
||||||
|
### KTD5: Default phase whitelist matches brainstorm R24
|
||||||
|
|
||||||
|
**Decision**: Default whitelist:
|
||||||
|
- `Planning`: `search`, `tool_search`, `read_file`, `bash` (read-only commands like `git status`, `ls`)
|
||||||
|
- `Building`: `write_file`, `bash` (write commands), `read_file`, `search`
|
||||||
|
- `Verification`: `bash` (test commands), `read_file`, `search`
|
||||||
|
- `Delivery`: all tools (final synthesis)
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- R24 explicitly names `think`/`search` for Planning and `write_file` for Building.
|
||||||
|
- `bash` is split: read-only in Planning, full in Building. Enforced by adding a `bash_command_filter` callback (regex-based, blocks `rm`/`mv`/`>`/`>>` in Planning/Verification).
|
||||||
|
- `Delivery` allows all tools to support last-mile formatting/cleanup.
|
||||||
|
|
||||||
|
### KTD6: Phase transitions are LLM-driven via `advance_phase` tool (opt-in auto-advance)
|
||||||
|
|
||||||
|
**Decision**: Add an `AdvancePhaseTool` that the LLM can call to transition Planning→Building→Verification→Delivery. Auto-advance (after N steps in current phase) is opt-in via `plan_exec.auto_advance_after_steps`.
|
||||||
|
|
||||||
|
**Rationale**:
|
||||||
|
- LLM-driven transitions match Qoder/Trae Work pattern: LLM declares "planning done" explicitly.
|
||||||
|
- Auto-advance is a safety net for LLMs that forget to call `advance_phase`; default off (ponytail: less code is better).
|
||||||
|
|
||||||
|
## Scope Boundaries
|
||||||
|
|
||||||
|
### In Scope
|
||||||
|
|
||||||
|
- New `ReadFileTool` with `symbol` parameter (G5, R22/R23).
|
||||||
|
- `SymbolExtractor` protocol + `AstSymbolExtractor` (Python) + `RegexSymbolExtractor` (TS/JS/Go/Rust/Java).
|
||||||
|
- `PhasePolicy` dataclass + `PhaseState` enum + per-step tool whitelist enforcement in `ReActEngine.execute` (G6, R24/R25).
|
||||||
|
- `AdvancePhaseTool` for LLM-driven phase transitions.
|
||||||
|
- WebSocket chat path routes `PLAN_EXEC` to new `_execute_plan_exec_ws` handler (KTD4).
|
||||||
|
- `plan_exec` config section in `agentkit.yaml` + `ServerConfig.from_dict` extension (R26).
|
||||||
|
- Tests for each new module (R27).
|
||||||
|
|
||||||
|
### Out of Scope (Deferred to Follow-Up Work)
|
||||||
|
|
||||||
|
- REST `send_message` PLAN_EXEC wiring — once WebSocket path is proven, REST wiring is a follow-up commit.
|
||||||
|
- `tree-sitter` integration for more accurate multi-language parsing (KTD1 upgrade path).
|
||||||
|
- Phase-aware prompt engineering (per-phase system prompt templates) — current plan keeps a single system prompt; phase-specific guidance is a prompt-engineering concern, not a code change.
|
||||||
|
- Phase persistence across session resume (U7 checkpoint already saves plan state; phase state restoration is a separate concern).
|
||||||
|
- Phase rollback on `Building` → `Planning` regression (Wave 2 G9 rollback handles file-level rollback; phase regression is a UX/prompt concern).
|
||||||
|
- Tool-filter UI in the frontend (Wave 3 ships backend-only; frontend surfaces phase via existing event channel if needed in a follow-up).
|
||||||
|
|
||||||
|
### Outside This Product's Identity
|
||||||
|
|
||||||
|
- Replacing the existing ReAct loop with LangGraph (inherited from brainstorm).
|
||||||
|
- Disc-based file system à la DeerFlow (inherited).
|
||||||
|
- Docker sandbox (inherited; only command-level safety via `bash_command_filter`).
|
||||||
|
|
||||||
|
## High-Level Technical Design
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
flowchart LR
|
||||||
|
subgraph G5[Function Sharding]
|
||||||
|
RF[ReadFileTool] --> SE[SymbolExtractor protocol]
|
||||||
|
SE --> AST[AstSymbolExtractor<br/>Python stdlib ast]
|
||||||
|
SE --> RX[RegexSymbolExtractor<br/>TS/JS/Go/Rust/Java]
|
||||||
|
end
|
||||||
|
|
||||||
|
subgraph G6[Phase State Machine]
|
||||||
|
PP[PhasePolicy config] --> PS[PhaseState enum<br/>Planning/Building/Verify/Delivery]
|
||||||
|
PS --> Filt[Tool filter per step]
|
||||||
|
Filt --> RE[ReActEngine.execute]
|
||||||
|
AP[AdvancePhaseTool] -->|transitions| PS
|
||||||
|
end
|
||||||
|
|
||||||
|
RF -->|file content for symbol| RE
|
||||||
|
RE -->|enforces| Filt
|
||||||
|
```
|
||||||
|
|
||||||
|
The two subsystems compose at the ReAct engine boundary: `ReadFileTool` is one of the tools the LLM can call during any phase (filtered by `PhasePolicy`); `PhaseState` is enforced at the tool-call step before dispatch.
|
||||||
|
|
||||||
|
## Implementation Units
|
||||||
|
|
||||||
|
### U1. SymbolExtractor + ReadFileTool (G5)
|
||||||
|
|
||||||
|
**Goal**: Add `ReadFileTool` with optional `symbol` parameter; implement `SymbolExtractor` protocol with `AstSymbolExtractor` (Python) and `RegexSymbolExtractor` (TS/JS/Go/Rust/Java).
|
||||||
|
|
||||||
|
**Requirements**: R22, R23, R27.
|
||||||
|
|
||||||
|
**Dependencies**: none.
|
||||||
|
|
||||||
|
**Files**:
|
||||||
|
- `src/agentkit/tools/file_read.py` (new)
|
||||||
|
- `src/agentkit/tools/symbol_extractor.py` (new)
|
||||||
|
- `src/agentkit/tools/__init__.py` (modify — register `ReadFileTool`)
|
||||||
|
- `tests/unit/test_symbol_extractor.py` (new)
|
||||||
|
- `tests/unit/test_read_file_tool.py` (new)
|
||||||
|
|
||||||
|
**Approach**:
|
||||||
|
- `SymbolExtractor` is a `Protocol` with one method: `extract_symbols(content: str, language: str) -> list[SymbolSpan]`. `SymbolSpan` carries `name`, `kind` (function/class/method/struct), `start_line`, `end_line`.
|
||||||
|
- `AstSymbolExtractor` walks `ast.parse(content)`; for each `FunctionDef`/`AsyncFunctionDef`/`ClassDef` collects name + line range. Uses `ast.get_source_segment` style (line-based, not node-based, to keep the API simple).
|
||||||
|
- `RegexSymbolExtractor` ships patterns for TS/JS (`function X`, `const X = (...) =>`, `class X`), Go (`func X`), Rust (`fn X`, `struct X`, `impl X`), Java (`public ... X(...)`). Falls back to "no symbols" if no pattern matches.
|
||||||
|
- `ReadFileTool.execute(path, symbol=None, start_line=None, end_line=None)`:
|
||||||
|
- `symbol=None` → read full file (backward compat with the existing `_FakeTool` benchmark shape).
|
||||||
|
- `symbol="foo"` → detect language from extension; call `extract_symbols`; return the line range of the first matching symbol; if no match, return an error result with available symbol names listed (so the LLM can retry).
|
||||||
|
- `start_line`/`end_line` overrides symbol; allows manual slicing.
|
||||||
|
- Tool registered as `read_file` (matches the reserved name in `core/react.py:148`).
|
||||||
|
|
||||||
|
**Execution note**: characterization-first — write a test that asserts the tool returns the full file content when `symbol=None` (matches pre-existing benchmark `_FakeTool` shape) before adding symbol-extraction behavior.
|
||||||
|
|
||||||
|
**Patterns to follow**:
|
||||||
|
- `src/agentkit/tools/document_tool.py` for tool structure (dataclass, `Tool` base class, `input_schema`).
|
||||||
|
- `src/agentkit/tools/schema_tools.py:SchemaExtractTool` for "extract-from-source" pattern.
|
||||||
|
|
||||||
|
**Test scenarios** (covers R22, R23):
|
||||||
|
- **Happy paths**:
|
||||||
|
- Python file, `symbol="MyClass"` → returns class body only (lines from `class MyClass:` through end of class).
|
||||||
|
- Python file, `symbol="my_func"` → returns function body only.
|
||||||
|
- TypeScript file, `symbol="renderComponent"` → returns arrow/function body.
|
||||||
|
- Go file, `symbol="HandleRequest"` → returns func body.
|
||||||
|
- **Edge cases**:
|
||||||
|
- `symbol=None` → returns full file content (characterization).
|
||||||
|
- `symbol="nonexistent"` → returns error result listing available symbols ("Available symbols: foo, bar, baz").
|
||||||
|
- Unsupported file extension (`.md`, `.txt`) → returns full file with `note: symbol extraction not supported for .md`.
|
||||||
|
- Empty file → returns empty content.
|
||||||
|
- File with nested classes → outer class symbol returns including inner class.
|
||||||
|
- **Error paths**:
|
||||||
|
- Path does not exist → raises `FileNotFoundError` (or returns error result matching other tools' convention).
|
||||||
|
- Path is a directory → returns error result.
|
||||||
|
- Permission denied → returns error result.
|
||||||
|
- **Integration scenarios**:
|
||||||
|
- Symbol extraction + line slicing: `symbol="foo"`, `end_line=50` truncates at line 50 even if symbol extends further.
|
||||||
|
- Round-trip: extract symbol, write back via `ShellTool` `sed` (not in scope for tool — just verify extracted range is well-formed).
|
||||||
|
|
||||||
|
**Verification**:
|
||||||
|
- `python3 -m pytest tests/unit/test_symbol_extractor.py tests/unit/test_read_file_tool.py -q` passes.
|
||||||
|
- `ruff check src/agentkit/tools/file_read.py src/agentkit/tools/symbol_extractor.py` clean.
|
||||||
|
- `ReadFileTool` appears in `ToolRegistry.list_tools()` after registration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U2. PhasePolicy + PhaseState + ServerConfig (G6 core)
|
||||||
|
|
||||||
|
**Goal**: Add `PhasePolicy` dataclass, `PhaseState` enum, default whitelist config. Extend `ServerConfig.from_dict` with `plan_exec` section. Wire config to `agentkit.yaml`.
|
||||||
|
|
||||||
|
**Requirements**: R25, R26.
|
||||||
|
|
||||||
|
**Dependencies**: none.
|
||||||
|
|
||||||
|
**Files**:
|
||||||
|
- `src/agentkit/core/phase.py` (new) — `PhaseState` enum, `PhasePolicy` dataclass, `default_policy()` factory.
|
||||||
|
- `src/agentkit/server/config.py` (modify — add `plan_exec` field + `from_dict` parsing).
|
||||||
|
- `agentkit.yaml` (modify — document `plan_exec:` section).
|
||||||
|
- `tests/unit/test_phase_policy.py` (new).
|
||||||
|
|
||||||
|
**Approach**:
|
||||||
|
- `PhaseState = enum("planning building verification delivery")`.
|
||||||
|
- `PhasePolicy` carries:
|
||||||
|
- `whitelist: dict[PhaseState, set[str]]` — tool names allowed per phase.
|
||||||
|
- `bash_command_filter: dict[PhaseState, re.Pattern | None]` — regex that bash args must NOT match (e.g., `r"\b(rm|mv|>|>>)\b"` in Planning).
|
||||||
|
- `auto_advance_after_steps: int | None` — None = manual (LLM calls `advance_phase`); int = auto-advance after N steps.
|
||||||
|
- `start_phase: PhaseState = PhaseState.PLANNING`.
|
||||||
|
- `default_policy()` returns the KTD5 whitelist above.
|
||||||
|
- `ServerConfig.from_dict` reads `plan_exec` section: `enabled`, `whitelist_override` (dict), `auto_advance_after_steps`.
|
||||||
|
- `agentkit.yaml` gains a commented-out `plan_exec:` block (commented to preserve default behavior — opt-in).
|
||||||
|
|
||||||
|
**Patterns to follow**:
|
||||||
|
- `src/agentkit/core/fallback.py` for dataclass + classmethod factory pattern.
|
||||||
|
- `src/agentkit/server/config.py` `from_dict` extension template (established in Wave 1 for `prompt_cache`/`streaming`/`verification`; Wave 2 added `rollback`/`fallback_chain`; Wave 3 adds `plan_exec`).
|
||||||
|
|
||||||
|
**Test scenarios** (covers R25, R26):
|
||||||
|
- **Happy paths**:
|
||||||
|
- `default_policy()` returns policy with all four phases; Planning whitelist contains `search`, `read_file`; Building contains `write_file`.
|
||||||
|
- `PhasePolicy.is_tool_allowed("search", PhaseState.PLANNING)` returns True.
|
||||||
|
- `PhasePolicy.is_tool_allowed("write_file", PhaseState.PLANNING)` returns False.
|
||||||
|
- `PhasePolicy.is_tool_allowed("write_file", PhaseState.BUILDING)` returns True.
|
||||||
|
- **Edge cases**:
|
||||||
|
- Empty whitelist for a phase → all tools rejected (raises `ValueError` at construction time — fail-fast).
|
||||||
|
- `Delivery` phase whitelist contains `"*"` (wildcard) → all tools allowed.
|
||||||
|
- Custom whitelist override merges with default (override wins on conflict).
|
||||||
|
- **Error paths**:
|
||||||
|
- Invalid phase name in config → `ValueError` with message naming the bad value.
|
||||||
|
- `bash_command_filter` regex compile failure → `ValueError`.
|
||||||
|
- **Config integration**:
|
||||||
|
- `ServerConfig.from_dict({"plan_exec": {"enabled": True, "auto_advance_after_steps": 5}})` populates fields correctly.
|
||||||
|
- `ServerConfig.from_dict({})` → `plan_exec = {}` (default).
|
||||||
|
|
||||||
|
**Verification**:
|
||||||
|
- `python3 -m pytest tests/unit/test_phase_policy.py -q` passes.
|
||||||
|
- `ruff check src/agentkit/core/phase.py` clean.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U3. AdvancePhaseTool + ReActEngine phase enforcement (G6 wiring)
|
||||||
|
|
||||||
|
**Goal**: Add `AdvancePhaseTool`. Wire `PhasePolicy` into `ReActEngine.execute` so each tool-call step checks `is_tool_allowed(tool_name, current_phase)` before dispatch; blocked calls return a structured error to the LLM ("Tool 'write_file' not allowed in Planning phase — call advance_phase first").
|
||||||
|
|
||||||
|
**Requirements**: R24.
|
||||||
|
|
||||||
|
**Dependencies**: U2.
|
||||||
|
|
||||||
|
**Files**:
|
||||||
|
- `src/agentkit/tools/advance_phase.py` (new) — `AdvancePhaseTool` calls `react_engine.advance_phase()`.
|
||||||
|
- `src/agentkit/core/react.py` (modify — add `phase_policy` param to `__init__` + `execute`; add `_current_phase` field; add `advance_phase()` method; enforce in `_execute_loop`).
|
||||||
|
- `tests/unit/test_react_phase_enforcement.py` (new).
|
||||||
|
|
||||||
|
**Approach**:
|
||||||
|
- `ReActEngine.__init__` accepts `phase_policy: PhasePolicy | None = None`. None = no enforcement (backward compat — all existing callers unaffected).
|
||||||
|
- `_current_phase: PhaseState | None` initialized from `phase_policy.start_phase` if policy set, else None.
|
||||||
|
- `advance_phase()` advances `_current_phase` to next enum value; raises `ValueError` if already at `DELIVERY`.
|
||||||
|
- In `_execute_loop`, before dispatching a tool call:
|
||||||
|
```python
|
||||||
|
if self._phase_policy is not None and self._current_phase is not None:
|
||||||
|
if not self._phase_policy.is_tool_allowed(tool_name, self._current_phase):
|
||||||
|
# Inject structured error into conversation, do NOT dispatch tool.
|
||||||
|
# This counts as a "step" for max_steps purposes.
|
||||||
|
observation = {
|
||||||
|
"error": "phase_violation",
|
||||||
|
"message": f"Tool '{tool_name}' not allowed in {self._current_phase.value} phase",
|
||||||
|
"current_phase": self._current_phase.value,
|
||||||
|
"hint": "Call advance_phase to move to Building phase"
|
||||||
|
}
|
||||||
|
continue # next loop iteration
|
||||||
|
```
|
||||||
|
- Auto-advance: if `phase_policy.auto_advance_after_steps` is set and `_steps_in_phase >= auto_advance_after_steps`, call `advance_phase()` automatically.
|
||||||
|
- `AdvancePhaseTool.execute()` calls the bound engine's `advance_phase()` and returns the new phase name. Registered only when `phase_policy` is not None.
|
||||||
|
|
||||||
|
**Execution note**: characterization-first — test that `ReActEngine` with `phase_policy=None` behaves identically to pre-change (no enforcement, no `advance_phase` tool, no `_current_phase` mutation). Then add enforcement tests.
|
||||||
|
|
||||||
|
**Patterns to follow**:
|
||||||
|
- `src/agentkit/core/react.py` `verification_enabled` pattern (feature flag + step-level check).
|
||||||
|
- `src/agentkit/tools/ask_human.py` for tool that interacts with engine state.
|
||||||
|
|
||||||
|
**Test scenarios** (covers R24):
|
||||||
|
- **Characterization (no policy)**:
|
||||||
|
- `ReActEngine(phase_policy=None)` — all tools allowed in all steps; no `advance_phase` tool registered; behavior matches pre-change.
|
||||||
|
- **Happy paths**:
|
||||||
|
- Planning phase: LLM calls `search` → executes; LLM calls `advance_phase` → phase becomes Building.
|
||||||
|
- Building phase: LLM calls `write_file` → executes; LLM calls `advance_phase` → phase becomes Verification.
|
||||||
|
- Verification phase: LLM calls `bash` with `pytest` → executes; LLM calls `advance_phase` → phase becomes Delivery.
|
||||||
|
- Delivery phase: LLM calls any tool → executes (wildcard).
|
||||||
|
- **Edge cases**:
|
||||||
|
- `advance_phase` called at Delivery → returns error "Already at final phase".
|
||||||
|
- Auto-advance after 3 steps in Planning → phase transitions automatically on 4th step.
|
||||||
|
- `bash` command in Planning contains `rm file` → blocked by `bash_command_filter`.
|
||||||
|
- **Error paths**:
|
||||||
|
- LLM calls `write_file` in Planning → tool NOT dispatched; structured error returned to LLM; loop continues.
|
||||||
|
- LLM calls non-existent tool → existing error path (not phase-related).
|
||||||
|
- **Integration scenarios**:
|
||||||
|
- Phase transition emits a `phase_changed` event (use existing `_broadcast_event` pattern from `experts/orchestrator.py`).
|
||||||
|
- `max_steps` reached mid-phase → `ReActResult.status = "max_steps_reached"` (existing path, no change).
|
||||||
|
|
||||||
|
**Verification**:
|
||||||
|
- `python3 -m pytest tests/unit/test_react_phase_enforcement.py -q` passes.
|
||||||
|
- Existing `tests/unit/test_react_engine.py` still passes (characterization — no policy = no change).
|
||||||
|
- `ruff check src/agentkit/core/react.py src/agentkit/tools/advance_phase.py` clean.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### U4. Wire PLAN_EXEC at chat.py WebSocket path (G6 chat integration)
|
||||||
|
|
||||||
|
**Goal**: Replace the `chat.py:1084` "not yet supported, falling back to REACT" warning with a real PLAN_EXEC handler that constructs `PhasePolicy` from `ServerConfig.plan_exec` and dispatches to `ReActEngine.execute` with the policy set.
|
||||||
|
|
||||||
|
**Requirements**: R24, R25 (end-to-end wiring).
|
||||||
|
|
||||||
|
**Dependencies**: U2, U3.
|
||||||
|
|
||||||
|
**Files**:
|
||||||
|
- `src/agentkit/server/routes/chat.py` (modify — add `_execute_plan_exec_ws` handler; branch on `ExecutionMode.PLAN_EXEC`).
|
||||||
|
- `tests/unit/test_chat_plan_exec_ws.py` (new).
|
||||||
|
|
||||||
|
**Approach**:
|
||||||
|
- New helper `_execute_plan_exec_ws(websocket, agent, routing, messages, ...)`:
|
||||||
|
1. Read `server_config.plan_exec` (may be `{}` if not configured → use `default_policy()`).
|
||||||
|
2. Build `PhasePolicy` from config (apply overrides).
|
||||||
|
3. Construct `ReActEngine(..., phase_policy=policy)`.
|
||||||
|
4. Register `AdvancePhaseTool` bound to this engine.
|
||||||
|
5. Call `engine.execute_stream(...)` — reuses existing streaming path.
|
||||||
|
6. Emit `phase_changed` events through the WebSocket (frontend can render phase indicator).
|
||||||
|
- chat.py:1084 changes from `if execution_mode not in (REACT, SKILL_REACT): warn + fall back` to:
|
||||||
|
```python
|
||||||
|
if routing.execution_mode == ExecutionMode.PLAN_EXEC:
|
||||||
|
await _execute_plan_exec_ws(websocket, agent, routing, ...)
|
||||||
|
return
|
||||||
|
if routing.execution_mode not in (ExecutionMode.REACT, ExecutionMode.SKILL_REACT):
|
||||||
|
# existing warning for REWOO/REFLEXION/TEAM_COLLAB
|
||||||
|
...
|
||||||
|
```
|
||||||
|
- REST `send_message` path: explicitly raise `HTTPException(501, "PLAN_EXEC via REST not yet supported; use WebSocket")` — Wave 3 does NOT wire REST (KTD4).
|
||||||
|
|
||||||
|
**Execution note**: characterization-first — test that existing REWOO/REFLEXION/TEAM_COLLAB modes still fall back to REACT with the warning (no regression). Then add PLAN_EXEC wiring.
|
||||||
|
|
||||||
|
**Patterns to follow**:
|
||||||
|
- `src/agentkit/server/routes/chat.py` existing WebSocket handler structure (lines 1082-1100).
|
||||||
|
- `src/agentkit/server/_fallback_chain.py` (Wave 2 U3) for "construct engine per-request with config" pattern.
|
||||||
|
|
||||||
|
**Test scenarios** (covers end-to-end):
|
||||||
|
- **Characterization**:
|
||||||
|
- `ExecutionMode.REWOO` via WebSocket → still falls back to REACT with warning (existing behavior unchanged).
|
||||||
|
- `ExecutionMode.REFLEXION` → same.
|
||||||
|
- `ExecutionMode.TEAM_COLLAB` → same.
|
||||||
|
- **Happy paths**:
|
||||||
|
- `ExecutionMode.PLAN_EXEC` via WebSocket → `_execute_plan_exec_ws` invoked; `ReActEngine` constructed with `phase_policy`; `AdvancePhaseTool` registered.
|
||||||
|
- Planning phase: LLM emits `search` tool call → executed; tool result streamed.
|
||||||
|
- LLM emits `advance_phase` → `phase_changed` event sent to WebSocket client; subsequent `write_file` call now allowed.
|
||||||
|
- **Edge cases**:
|
||||||
|
- `plan_exec` config absent → `default_policy()` used; behavior matches KTD5 whitelist.
|
||||||
|
- `plan_exec.enabled=False` → falls back to REACT (opt-out).
|
||||||
|
- Phase violation: LLM calls `write_file` in Planning → structured error returned; loop continues; `phase_violation` event emitted.
|
||||||
|
- **Error paths**:
|
||||||
|
- REST `send_message` with PLAN_EXEC → 501 error.
|
||||||
|
- Phase policy construction fails (bad config) → 500 error with message.
|
||||||
|
- **Integration scenarios**:
|
||||||
|
- Existing fallback chain (Wave 2 U3) NOT applied to PLAN_EXEC — phase policy and fallback chain are mutually exclusive (KTD5 from Wave 2 plan: chain only wraps REACT/SKILL_REACT at REST). Document this in chat.py comment.
|
||||||
|
|
||||||
|
**Verification**:
|
||||||
|
- `python3 -m pytest tests/unit/test_chat_plan_exec_ws.py -q` passes.
|
||||||
|
- `ruff check src/agentkit/server/routes/chat.py` clean.
|
||||||
|
- Manual test: `agentkit chat` with `@skill:plan_exec_demo` skill config → WebSocket stream includes `phase_changed` events.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Risks & Dependencies
|
||||||
|
|
||||||
|
### Risks
|
||||||
|
|
||||||
|
1. **ReAct core modification risk (high)**: U3 modifies `ReActEngine._execute_loop`. Mitigation: characterization-first tests (U3 Execution note); `phase_policy=None` default preserves all existing behavior; full `test_react_engine.py` regression.
|
||||||
|
2. **Symbol extraction accuracy (medium)**: Regex extractor may miss edge cases (decorated functions, nested generics, multi-line signatures). Mitigation: fall back to "no symbols found → read full file" gracefully; never raise on extraction failure.
|
||||||
|
3. **PLAN_EXEC phase deadlock (medium)**: LLM may never call `advance_phase`, leaving the agent stuck in Planning. Mitigation: `auto_advance_after_steps` config (default 5); timeout via existing `max_steps`.
|
||||||
|
4. **Tool name drift (low)**: Phase whitelist references tool names (`write_file`, `search`, etc.) that may be renamed in future. Mitigation: whitelist is config-driven; rename only requires config update.
|
||||||
|
|
||||||
|
### Dependencies
|
||||||
|
|
||||||
|
- Wave 2 PR #5 (`feat/agent-wave2-medium-coupling`) should be merged first — Wave 3 builds on the `ServerConfig.from_dict` extension pattern and the `_fallback_chain.py` integration shape established there. If PR #5 is still open, Wave 3 branches from `feat/agent-wave2-medium-coupling` rather than `main`.
|
||||||
|
- No external library dependencies (KTD1).
|
||||||
|
|
||||||
|
## System-Wide Impact
|
||||||
|
|
||||||
|
- **Agents using PLAN_EXEC mode**: gain phase enforcement. Existing REACT/SKILL_REACT/DIRECT_CHAT agents: zero change (phase_policy defaults to None).
|
||||||
|
- **Tool registry**: gains two new tools (`read_file`, `advance_phase`). Frontend tool list display may need updating to show the new icons — out of scope for Wave 3 (frontend follows up).
|
||||||
|
- **`agentkit.yaml`**: gains `plan_exec:` section (commented by default). Existing configs unaffected.
|
||||||
|
- **WebSocket clients**: gain `phase_changed` event type. Existing clients ignore unknown event types (verified in Wave 2 — `phase_rollback_*` events follow the same pattern).
|
||||||
|
|
||||||
|
## Sources & Research
|
||||||
|
|
||||||
|
- Origin brainstorm: `docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md` (Wave 3 section, KTD6/KTD7).
|
||||||
|
- Wave 1 plan: `docs/plans/2026-06-29-002-feat-agent-wave1-quick-wins-plan.md` (PR #4 merged).
|
||||||
|
- Wave 2 plan: `docs/plans/2026-06-29-003-feat-agent-wave2-medium-coupling-plan.md` (PR #5 open).
|
||||||
|
- Trae Work architecture research (cited in brainstorm): SOLO four-stage state machine pattern.
|
||||||
|
- Qoder architecture research (cited in brainstorm): Spec→Coding→Verify closed loop.
|
||||||
|
- Codebase: `src/agentkit/core/react.py:148` reserves `read_file`/`write_file` tool names in `_DEFAULT_CORE_TOOLS` — Wave 3 U1 delivers the missing `read_file` implementation.
|
||||||
|
- Codebase: `src/agentkit/server/routes/chat.py:1084` documents that PLAN_EXEC is "not yet supported" — Wave 3 U4 closes this gap.
|
||||||
|
|
||||||
|
## Deferred to Implementation
|
||||||
|
|
||||||
|
- Exact regex patterns for non-Python symbol extraction (U1) — design above gives the shape; implementer finalizes patterns based on real-world test fixtures.
|
||||||
|
- `bash_command_filter` regex precision (U2) — defaults block `rm`/`mv`/`>`/`>>`; implementer may add more based on test scenarios.
|
||||||
|
- `phase_changed` event payload shape (U3/U4) — minimal viable shape: `{"phase": "building", "previous": "planning"}`; frontend rendering concerns are out of scope.
|
||||||
|
- Whether `AdvancePhaseTool` accepts a `target_phase` argument for skipping phases (e.g., Planning → Verification) — default no (sequential only); add if test scenarios reveal a need.
|
||||||
Loading…
Reference in New Issue