fischer-agentkit/docs/plans/2026-06-29-004-feat-agent-w...

28 KiB

title date type status origin execution
feat: Agent Wave 3 strategic coupling (G5/G6) 2026-06-29 feat draft docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md code

Wave 3 Strategic Coupling — G5 Function-level Sharding + G6 SOLO Phase Constraints

Summary

Wave 3 of the advanced-agent gap optimization closes two strategic gaps deferred from Waves 1-2:

  • G5 — Function-level code sharding (R22, R23): file reading gains an optional symbol parameter for symbol/function-granularity slicing, backward compatible with full-file reads.
  • G6 — SOLO four-stage state machine (R24, R25): ReAct loop enforces per-phase tool whitelists (Planning → Building → Verification → Delivery). Extends existing ExecutionMode.PLAN_EXEC rather than introducing a new mode.

Wave 1 (G1/G2/G3/G8 — PR #4 merged) and Wave 2 (G4/G7/G9 — PR #5 open) shipped independently. Wave 3 is the strategic-risk wave: it introduces a new tool (G5) and touches ReAct core (G6). Per the brainstorm's KTD6/KTD7 locked decisions, G5 integration approach is decided here (not deferred further), and G6 extends PLAN_EXEC rather than adding a new mode.

Problem Frame

The brainstorm (docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md) identified nine gaps across three dimensions. Waves 1-2 closed seven (G1-G4, G7-G9). The remaining two gaps are strategic:

  • G5 (Long-context cost): Large files (e.g. 5000-line modules) blow context budget when the agent only needs one function. Today the agent shells out to cat file.py or greps; both pull the whole file into context. A symbol-aware slice would cut context cost 10-50x for typical edits.
  • G6 (Unsafe tool sequencing): ReAct loop lets the LLM call write_file during early exploration before committing to a plan. This wastes tokens on premature edits, causes half-baked refactors, and breaks the "plan-then-build" discipline that production agents (Qoder, Trae Work) enforce via phase state machines.

KTD6 (brainstorm) defers the G5 integration decision to this plan. KTD7 locks G6 to extend PLAN_EXEC rather than introduce a new mode.

Requirements

Carried forward from the brainstorm, Wave 3 section:

  • R22: File reading supports symbol/function granularity sharding.
  • R23: Sharding capability exposed as a tool parameter (symbol="function_name"), backward compatible with full-file reads.
  • R24: ReAct loop enforces phase constraints — Planning phase only allows think/search; Building phase only allows write_file (and similar write tools).
  • R25: Phase state is configurable; extends ExecutionMode.PLAN_EXEC, does NOT introduce a new mode.

Cross-cutting (brainstorm):

  • R26: All optimizations configurable via agentkit.yaml (follow ServerConfig.from_dict pattern established in Waves 1-2).
  • R27: Each optimization ships a minimal self-check test (ponytail rule).

Acceptance examples relevant to Wave 3:

  • AE5 already covered by Wave 2 (G9) — not in Wave 3 scope.
  • No explicit AE for G5/G6 in the brainstorm — the plan below specifies test scenarios as the acceptance contract.

Key Technical Decisions

KTD1: G5 uses Python ast + language-aware regex, NOT tree-sitter

Decision: Implement symbol extraction with the Python stdlib ast module for Python files, and a small regex-based extractor for TypeScript/JavaScript/Go/Rust/Java. No new dependency.

Rationale:

  • tree-sitter requires native compilation + per-language grammar files (~30MB installed) — violates the ponytail rule "no new dependency if it can be avoided" and AGENTS.md "禁止使用 any 类型" → prefers minimal stack.
  • ast is stdlib, always available, parses Python accurately.
  • Regex extractor covers 80% case for TS/JS/Go/Rust/Java (function/class/struct declarations); falls back to "no symbols found → read full file" gracefully.
  • If a future Wave 4 needs more accurate multi-language parsing, tree-sitter can replace the regex layer behind the same SymbolExtractor interface.

Upgrade path: replace RegexSymbolExtractor with TreeSitterSymbolExtractor implementing the same SymbolExtractor protocol; no caller changes.

KTD2: G5 adds a new ReadFileTool, does not extend ShellTool

Decision: Add a dedicated ReadFileTool in src/agentkit/tools/file_read.py with path + optional symbol + optional start_line/end_line parameters.

Rationale:

  • ShellTool is for shell execution; grafting symbol extraction onto it muddies the contract.
  • A dedicated tool gives the LLM a clear schema ({"path": "...", "symbol": "function_name"}) and a focused system-prompt description.
  • Aligns with the existing _DEFAULT_CORE_TOOLS list in core/react.py:148 which already references read_file — the name is reserved but the implementation is missing.

KTD3: G6 phase state machine lives in ReActEngine, not in the skill config

Decision: Phase state (Planning/Building/Verification/Delivery) is tracked as a mutable field on ReActEngine instance. Transitions are driven by LLM-detected phase-completion signals (e.g., the LLM emits Phase: Building in its thinking) OR by an explicit advance_phase tool.

Rationale:

  • Skill config declares the policy (which tools per phase, auto-advance vs manual); the engine enforces it per-step. This matches R24 ("ReAct 循环加阶段约束").
  • Alternative considered: phase state in the agent instance (not engine). Rejected because ReActEngine already owns max_steps/verification_enabled etc.; phase state belongs with the loop that enforces it.

KTD4: PLAN_EXEC mode is wired at chat.py WebSocket path (REST already has fallback chain from Wave 2)

Decision: chat.py:1084 (currently warns "not yet supported, falling back to REACT") will route ExecutionMode.PLAN_EXEC to a new _execute_plan_exec_ws handler that constructs PhasePolicy from ServerConfig.plan_exec and passes it to ReActEngine.execute.

Rationale:

  • REST send_message already uses the Wave 2 three-tier fallback chain; PLAN_EXEC at REST would also need that wrapper. Out of scope for Wave 3 — only WebSocket path is wired. REST PLAN_EXEC remains "not yet supported" and explicitly raises if invoked.
  • Single integration point keeps Wave 3 scope bounded; REST wiring is a one-line follow-up once WebSocket path is proven.

KTD5: Default phase whitelist matches brainstorm R24

Decision: Default whitelist:

  • Planning: search, tool_search, read_file, bash (read-only commands like git status, ls)
  • Building: write_file, bash (write commands), read_file, search
  • Verification: bash (test commands), read_file, search
  • Delivery: all tools (final synthesis)

Rationale:

  • R24 explicitly names think/search for Planning and write_file for Building.
  • bash is split: read-only in Planning, full in Building. Enforced by adding a bash_command_filter callback (regex-based, blocks rm/mv/>/>> in Planning/Verification).
  • Delivery allows all tools to support last-mile formatting/cleanup.

KTD6: Phase transitions are LLM-driven via advance_phase tool (opt-in auto-advance)

Decision: Add an AdvancePhaseTool that the LLM can call to transition Planning→Building→Verification→Delivery. Auto-advance (after N steps in current phase) is opt-in via plan_exec.auto_advance_after_steps.

Rationale:

  • LLM-driven transitions match Qoder/Trae Work pattern: LLM declares "planning done" explicitly.
  • Auto-advance is a safety net for LLMs that forget to call advance_phase; default off (ponytail: less code is better).

Scope Boundaries

In Scope

  • New ReadFileTool with symbol parameter (G5, R22/R23).
  • SymbolExtractor protocol + AstSymbolExtractor (Python) + RegexSymbolExtractor (TS/JS/Go/Rust/Java).
  • PhasePolicy dataclass + PhaseState enum + per-step tool whitelist enforcement in ReActEngine.execute (G6, R24/R25).
  • AdvancePhaseTool for LLM-driven phase transitions.
  • WebSocket chat path routes PLAN_EXEC to new _execute_plan_exec_ws handler (KTD4).
  • plan_exec config section in agentkit.yaml + ServerConfig.from_dict extension (R26).
  • Tests for each new module (R27).

Out of Scope (Deferred to Follow-Up Work)

  • REST send_message PLAN_EXEC wiring — once WebSocket path is proven, REST wiring is a follow-up commit.
  • tree-sitter integration for more accurate multi-language parsing (KTD1 upgrade path).
  • Phase-aware prompt engineering (per-phase system prompt templates) — current plan keeps a single system prompt; phase-specific guidance is a prompt-engineering concern, not a code change.
  • Phase persistence across session resume (U7 checkpoint already saves plan state; phase state restoration is a separate concern).
  • Phase rollback on BuildingPlanning regression (Wave 2 G9 rollback handles file-level rollback; phase regression is a UX/prompt concern).
  • Tool-filter UI in the frontend (Wave 3 ships backend-only; frontend surfaces phase via existing event channel if needed in a follow-up).

Outside This Product's Identity

  • Replacing the existing ReAct loop with LangGraph (inherited from brainstorm).
  • Disc-based file system à la DeerFlow (inherited).
  • Docker sandbox (inherited; only command-level safety via bash_command_filter).

High-Level Technical Design

flowchart LR
    subgraph G5[Function Sharding]
        RF[ReadFileTool] --> SE[SymbolExtractor protocol]
        SE --> AST[AstSymbolExtractor<br/>Python stdlib ast]
        SE --> RX[RegexSymbolExtractor<br/>TS/JS/Go/Rust/Java]
    end

    subgraph G6[Phase State Machine]
        PP[PhasePolicy config] --> PS[PhaseState enum<br/>Planning/Building/Verify/Delivery]
        PS --> Filt[Tool filter per step]
        Filt --> RE[ReActEngine.execute]
        AP[AdvancePhaseTool] -->|transitions| PS
    end

    RF -->|file content for symbol| RE
    RE -->|enforces| Filt

The two subsystems compose at the ReAct engine boundary: ReadFileTool is one of the tools the LLM can call during any phase (filtered by PhasePolicy); PhaseState is enforced at the tool-call step before dispatch.

Implementation Units

U1. SymbolExtractor + ReadFileTool (G5)

Goal: Add ReadFileTool with optional symbol parameter; implement SymbolExtractor protocol with AstSymbolExtractor (Python) and RegexSymbolExtractor (TS/JS/Go/Rust/Java).

Requirements: R22, R23, R27.

Dependencies: none.

Files:

  • src/agentkit/tools/file_read.py (new)
  • src/agentkit/tools/symbol_extractor.py (new)
  • src/agentkit/tools/__init__.py (modify — register ReadFileTool)
  • tests/unit/test_symbol_extractor.py (new)
  • tests/unit/test_read_file_tool.py (new)

Approach:

  • SymbolExtractor is a Protocol with one method: extract_symbols(content: str, language: str) -> list[SymbolSpan]. SymbolSpan carries name, kind (function/class/method/struct), start_line, end_line.
  • AstSymbolExtractor walks ast.parse(content); for each FunctionDef/AsyncFunctionDef/ClassDef collects name + line range. Uses ast.get_source_segment style (line-based, not node-based, to keep the API simple).
  • RegexSymbolExtractor ships patterns for TS/JS (function X, const X = (...) =>, class X), Go (func X), Rust (fn X, struct X, impl X), Java (public ... X(...)). Falls back to "no symbols" if no pattern matches.
  • ReadFileTool.execute(path, symbol=None, start_line=None, end_line=None):
    • symbol=None → read full file (backward compat with the existing _FakeTool benchmark shape).
    • symbol="foo" → detect language from extension; call extract_symbols; return the line range of the first matching symbol; if no match, return an error result with available symbol names listed (so the LLM can retry).
    • start_line/end_line overrides symbol; allows manual slicing.
  • Tool registered as read_file (matches the reserved name in core/react.py:148).

Execution note: characterization-first — write a test that asserts the tool returns the full file content when symbol=None (matches pre-existing benchmark _FakeTool shape) before adding symbol-extraction behavior.

Patterns to follow:

  • src/agentkit/tools/document_tool.py for tool structure (dataclass, Tool base class, input_schema).
  • src/agentkit/tools/schema_tools.py:SchemaExtractTool for "extract-from-source" pattern.

Test scenarios (covers R22, R23):

  • Happy paths:
    • Python file, symbol="MyClass" → returns class body only (lines from class MyClass: through end of class).
    • Python file, symbol="my_func" → returns function body only.
    • TypeScript file, symbol="renderComponent" → returns arrow/function body.
    • Go file, symbol="HandleRequest" → returns func body.
  • Edge cases:
    • symbol=None → returns full file content (characterization).
    • symbol="nonexistent" → returns error result listing available symbols ("Available symbols: foo, bar, baz").
    • Unsupported file extension (.md, .txt) → returns full file with note: symbol extraction not supported for .md.
    • Empty file → returns empty content.
    • File with nested classes → outer class symbol returns including inner class.
  • Error paths:
    • Path does not exist → raises FileNotFoundError (or returns error result matching other tools' convention).
    • Path is a directory → returns error result.
    • Permission denied → returns error result.
  • Integration scenarios:
    • Symbol extraction + line slicing: symbol="foo", end_line=50 truncates at line 50 even if symbol extends further.
    • Round-trip: extract symbol, write back via ShellTool sed (not in scope for tool — just verify extracted range is well-formed).

Verification:

  • python3 -m pytest tests/unit/test_symbol_extractor.py tests/unit/test_read_file_tool.py -q passes.
  • ruff check src/agentkit/tools/file_read.py src/agentkit/tools/symbol_extractor.py clean.
  • ReadFileTool appears in ToolRegistry.list_tools() after registration.

U2. PhasePolicy + PhaseState + ServerConfig (G6 core)

Goal: Add PhasePolicy dataclass, PhaseState enum, default whitelist config. Extend ServerConfig.from_dict with plan_exec section. Wire config to agentkit.yaml.

Requirements: R25, R26.

Dependencies: none.

Files:

  • src/agentkit/core/phase.py (new) — PhaseState enum, PhasePolicy dataclass, default_policy() factory.
  • src/agentkit/server/config.py (modify — add plan_exec field + from_dict parsing).
  • agentkit.yaml (modify — document plan_exec: section).
  • tests/unit/test_phase_policy.py (new).

Approach:

  • PhaseState = enum("planning building verification delivery").
  • PhasePolicy carries:
    • whitelist: dict[PhaseState, set[str]] — tool names allowed per phase.
    • bash_command_filter: dict[PhaseState, re.Pattern | None] — regex that bash args must NOT match (e.g., r"\b(rm|mv|>|>>)\b" in Planning).
    • auto_advance_after_steps: int | None — None = manual (LLM calls advance_phase); int = auto-advance after N steps.
    • start_phase: PhaseState = PhaseState.PLANNING.
  • default_policy() returns the KTD5 whitelist above.
  • ServerConfig.from_dict reads plan_exec section: enabled, whitelist_override (dict), auto_advance_after_steps.
  • agentkit.yaml gains a commented-out plan_exec: block (commented to preserve default behavior — opt-in).

Patterns to follow:

  • src/agentkit/core/fallback.py for dataclass + classmethod factory pattern.
  • src/agentkit/server/config.py from_dict extension template (established in Wave 1 for prompt_cache/streaming/verification; Wave 2 added rollback/fallback_chain; Wave 3 adds plan_exec).

Test scenarios (covers R25, R26):

  • Happy paths:
    • default_policy() returns policy with all four phases; Planning whitelist contains search, read_file; Building contains write_file.
    • PhasePolicy.is_tool_allowed("search", PhaseState.PLANNING) returns True.
    • PhasePolicy.is_tool_allowed("write_file", PhaseState.PLANNING) returns False.
    • PhasePolicy.is_tool_allowed("write_file", PhaseState.BUILDING) returns True.
  • Edge cases:
    • Empty whitelist for a phase → all tools rejected (raises ValueError at construction time — fail-fast).
    • Delivery phase whitelist contains "*" (wildcard) → all tools allowed.
    • Custom whitelist override merges with default (override wins on conflict).
  • Error paths:
    • Invalid phase name in config → ValueError with message naming the bad value.
    • bash_command_filter regex compile failure → ValueError.
  • Config integration:
    • ServerConfig.from_dict({"plan_exec": {"enabled": True, "auto_advance_after_steps": 5}}) populates fields correctly.
    • ServerConfig.from_dict({})plan_exec = {} (default).

Verification:

  • python3 -m pytest tests/unit/test_phase_policy.py -q passes.
  • ruff check src/agentkit/core/phase.py clean.

U3. AdvancePhaseTool + ReActEngine phase enforcement (G6 wiring)

Goal: Add AdvancePhaseTool. Wire PhasePolicy into ReActEngine.execute so each tool-call step checks is_tool_allowed(tool_name, current_phase) before dispatch; blocked calls return a structured error to the LLM ("Tool 'write_file' not allowed in Planning phase — call advance_phase first").

Requirements: R24.

Dependencies: U2.

Files:

  • src/agentkit/tools/advance_phase.py (new) — AdvancePhaseTool calls react_engine.advance_phase().
  • src/agentkit/core/react.py (modify — add phase_policy param to __init__ + execute; add _current_phase field; add advance_phase() method; enforce in _execute_loop).
  • tests/unit/test_react_phase_enforcement.py (new).

Approach:

  • ReActEngine.__init__ accepts phase_policy: PhasePolicy | None = None. None = no enforcement (backward compat — all existing callers unaffected).
  • _current_phase: PhaseState | None initialized from phase_policy.start_phase if policy set, else None.
  • advance_phase() advances _current_phase to next enum value; raises ValueError if already at DELIVERY.
  • In _execute_loop, before dispatching a tool call:
    if self._phase_policy is not None and self._current_phase is not None:
        if not self._phase_policy.is_tool_allowed(tool_name, self._current_phase):
            # Inject structured error into conversation, do NOT dispatch tool.
            # This counts as a "step" for max_steps purposes.
            observation = {
                "error": "phase_violation",
                "message": f"Tool '{tool_name}' not allowed in {self._current_phase.value} phase",
                "current_phase": self._current_phase.value,
                "hint": "Call advance_phase to move to Building phase"
            }
            continue  # next loop iteration
    
  • Auto-advance: if phase_policy.auto_advance_after_steps is set and _steps_in_phase >= auto_advance_after_steps, call advance_phase() automatically.
  • AdvancePhaseTool.execute() calls the bound engine's advance_phase() and returns the new phase name. Registered only when phase_policy is not None.

Execution note: characterization-first — test that ReActEngine with phase_policy=None behaves identically to pre-change (no enforcement, no advance_phase tool, no _current_phase mutation). Then add enforcement tests.

Patterns to follow:

  • src/agentkit/core/react.py verification_enabled pattern (feature flag + step-level check).
  • src/agentkit/tools/ask_human.py for tool that interacts with engine state.

Test scenarios (covers R24):

  • Characterization (no policy):
    • ReActEngine(phase_policy=None) — all tools allowed in all steps; no advance_phase tool registered; behavior matches pre-change.
  • Happy paths:
    • Planning phase: LLM calls search → executes; LLM calls advance_phase → phase becomes Building.
    • Building phase: LLM calls write_file → executes; LLM calls advance_phase → phase becomes Verification.
    • Verification phase: LLM calls bash with pytest → executes; LLM calls advance_phase → phase becomes Delivery.
    • Delivery phase: LLM calls any tool → executes (wildcard).
  • Edge cases:
    • advance_phase called at Delivery → returns error "Already at final phase".
    • Auto-advance after 3 steps in Planning → phase transitions automatically on 4th step.
    • bash command in Planning contains rm file → blocked by bash_command_filter.
  • Error paths:
    • LLM calls write_file in Planning → tool NOT dispatched; structured error returned to LLM; loop continues.
    • LLM calls non-existent tool → existing error path (not phase-related).
  • Integration scenarios:
    • Phase transition emits a phase_changed event (use existing _broadcast_event pattern from experts/orchestrator.py).
    • max_steps reached mid-phase → ReActResult.status = "max_steps_reached" (existing path, no change).

Verification:

  • python3 -m pytest tests/unit/test_react_phase_enforcement.py -q passes.
  • Existing tests/unit/test_react_engine.py still passes (characterization — no policy = no change).
  • ruff check src/agentkit/core/react.py src/agentkit/tools/advance_phase.py clean.

U4. Wire PLAN_EXEC at chat.py WebSocket path (G6 chat integration)

Goal: Replace the chat.py:1084 "not yet supported, falling back to REACT" warning with a real PLAN_EXEC handler that constructs PhasePolicy from ServerConfig.plan_exec and dispatches to ReActEngine.execute with the policy set.

Requirements: R24, R25 (end-to-end wiring).

Dependencies: U2, U3.

Files:

  • src/agentkit/server/routes/chat.py (modify — add _execute_plan_exec_ws handler; branch on ExecutionMode.PLAN_EXEC).
  • tests/unit/test_chat_plan_exec_ws.py (new).

Approach:

  • New helper _execute_plan_exec_ws(websocket, agent, routing, messages, ...):
    1. Read server_config.plan_exec (may be {} if not configured → use default_policy()).
    2. Build PhasePolicy from config (apply overrides).
    3. Construct ReActEngine(..., phase_policy=policy).
    4. Register AdvancePhaseTool bound to this engine.
    5. Call engine.execute_stream(...) — reuses existing streaming path.
    6. Emit phase_changed events through the WebSocket (frontend can render phase indicator).
  • chat.py:1084 changes from if execution_mode not in (REACT, SKILL_REACT): warn + fall back to:
    if routing.execution_mode == ExecutionMode.PLAN_EXEC:
        await _execute_plan_exec_ws(websocket, agent, routing, ...)
        return
    if routing.execution_mode not in (ExecutionMode.REACT, ExecutionMode.SKILL_REACT):
        # existing warning for REWOO/REFLEXION/TEAM_COLLAB
        ...
    
  • REST send_message path: explicitly raise HTTPException(501, "PLAN_EXEC via REST not yet supported; use WebSocket") — Wave 3 does NOT wire REST (KTD4).

Execution note: characterization-first — test that existing REWOO/REFLEXION/TEAM_COLLAB modes still fall back to REACT with the warning (no regression). Then add PLAN_EXEC wiring.

Patterns to follow:

  • src/agentkit/server/routes/chat.py existing WebSocket handler structure (lines 1082-1100).
  • src/agentkit/server/_fallback_chain.py (Wave 2 U3) for "construct engine per-request with config" pattern.

Test scenarios (covers end-to-end):

  • Characterization:
    • ExecutionMode.REWOO via WebSocket → still falls back to REACT with warning (existing behavior unchanged).
    • ExecutionMode.REFLEXION → same.
    • ExecutionMode.TEAM_COLLAB → same.
  • Happy paths:
    • ExecutionMode.PLAN_EXEC via WebSocket → _execute_plan_exec_ws invoked; ReActEngine constructed with phase_policy; AdvancePhaseTool registered.
    • Planning phase: LLM emits search tool call → executed; tool result streamed.
    • LLM emits advance_phasephase_changed event sent to WebSocket client; subsequent write_file call now allowed.
  • Edge cases:
    • plan_exec config absent → default_policy() used; behavior matches KTD5 whitelist.
    • plan_exec.enabled=False → falls back to REACT (opt-out).
    • Phase violation: LLM calls write_file in Planning → structured error returned; loop continues; phase_violation event emitted.
  • Error paths:
    • REST send_message with PLAN_EXEC → 501 error.
    • Phase policy construction fails (bad config) → 500 error with message.
  • Integration scenarios:
    • Existing fallback chain (Wave 2 U3) NOT applied to PLAN_EXEC — phase policy and fallback chain are mutually exclusive (KTD5 from Wave 2 plan: chain only wraps REACT/SKILL_REACT at REST). Document this in chat.py comment.

Verification:

  • python3 -m pytest tests/unit/test_chat_plan_exec_ws.py -q passes.
  • ruff check src/agentkit/server/routes/chat.py clean.
  • Manual test: agentkit chat with @skill:plan_exec_demo skill config → WebSocket stream includes phase_changed events.

Risks & Dependencies

Risks

  1. ReAct core modification risk (high): U3 modifies ReActEngine._execute_loop. Mitigation: characterization-first tests (U3 Execution note); phase_policy=None default preserves all existing behavior; full test_react_engine.py regression.
  2. Symbol extraction accuracy (medium): Regex extractor may miss edge cases (decorated functions, nested generics, multi-line signatures). Mitigation: fall back to "no symbols found → read full file" gracefully; never raise on extraction failure.
  3. PLAN_EXEC phase deadlock (medium): LLM may never call advance_phase, leaving the agent stuck in Planning. Mitigation: auto_advance_after_steps config (default 5); timeout via existing max_steps.
  4. Tool name drift (low): Phase whitelist references tool names (write_file, search, etc.) that may be renamed in future. Mitigation: whitelist is config-driven; rename only requires config update.

Dependencies

  • Wave 2 PR #5 (feat/agent-wave2-medium-coupling) should be merged first — Wave 3 builds on the ServerConfig.from_dict extension pattern and the _fallback_chain.py integration shape established there. If PR #5 is still open, Wave 3 branches from feat/agent-wave2-medium-coupling rather than main.
  • No external library dependencies (KTD1).

System-Wide Impact

  • Agents using PLAN_EXEC mode: gain phase enforcement. Existing REACT/SKILL_REACT/DIRECT_CHAT agents: zero change (phase_policy defaults to None).
  • Tool registry: gains two new tools (read_file, advance_phase). Frontend tool list display may need updating to show the new icons — out of scope for Wave 3 (frontend follows up).
  • agentkit.yaml: gains plan_exec: section (commented by default). Existing configs unaffected.
  • WebSocket clients: gain phase_changed event type. Existing clients ignore unknown event types (verified in Wave 2 — phase_rollback_* events follow the same pattern).

Sources & Research

  • Origin brainstorm: docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md (Wave 3 section, KTD6/KTD7).
  • Wave 1 plan: docs/plans/2026-06-29-002-feat-agent-wave1-quick-wins-plan.md (PR #4 merged).
  • Wave 2 plan: docs/plans/2026-06-29-003-feat-agent-wave2-medium-coupling-plan.md (PR #5 open).
  • Trae Work architecture research (cited in brainstorm): SOLO four-stage state machine pattern.
  • Qoder architecture research (cited in brainstorm): Spec→Coding→Verify closed loop.
  • Codebase: src/agentkit/core/react.py:148 reserves read_file/write_file tool names in _DEFAULT_CORE_TOOLS — Wave 3 U1 delivers the missing read_file implementation.
  • Codebase: src/agentkit/server/routes/chat.py:1084 documents that PLAN_EXEC is "not yet supported" — Wave 3 U4 closes this gap.

Deferred to Implementation

  • Exact regex patterns for non-Python symbol extraction (U1) — design above gives the shape; implementer finalizes patterns based on real-world test fixtures.
  • bash_command_filter regex precision (U2) — defaults block rm/mv/>/>>; implementer may add more based on test scenarios.
  • phase_changed event payload shape (U3/U4) — minimal viable shape: {"phase": "building", "previous": "planning"}; frontend rendering concerns are out of scope.
  • Whether AdvancePhaseTool accepts a target_phase argument for skipping phases (e.g., Planning → Verification) — default no (sequential only); add if test scenarios reveal a need.