28 KiB

Raw Blame History

title	date	type	status	origin	execution
feat: Agent Wave 3 strategic coupling (G5/G6)	2026-06-29	feat	draft	docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md	code

Wave 3 Strategic Coupling — G5 Function-level Sharding + G6 SOLO Phase Constraints

Summary

Wave 3 of the advanced-agent gap optimization closes two strategic gaps deferred from Waves 1-2:

G5 — Function-level code sharding (R22, R23): file reading gains an optional symbol parameter for symbol/function-granularity slicing, backward compatible with full-file reads.
G6 — SOLO four-stage state machine (R24, R25): ReAct loop enforces per-phase tool whitelists (Planning → Building → Verification → Delivery). Extends existing ExecutionMode.PLAN_EXEC rather than introducing a new mode.

Wave 1 (G1/G2/G3/G8 — PR #4 merged) and Wave 2 (G4/G7/G9 — PR #5 open) shipped independently. Wave 3 is the strategic-risk wave: it introduces a new tool (G5) and touches ReAct core (G6). Per the brainstorm's KTD6/KTD7 locked decisions, G5 integration approach is decided here (not deferred further), and G6 extends PLAN_EXEC rather than adding a new mode.

Problem Frame

The brainstorm (docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md) identified nine gaps across three dimensions. Waves 1-2 closed seven (G1-G4, G7-G9). The remaining two gaps are strategic:

G5 (Long-context cost): Large files (e.g. 5000-line modules) blow context budget when the agent only needs one function. Today the agent shells out to cat file.py or greps; both pull the whole file into context. A symbol-aware slice would cut context cost 10-50x for typical edits.
G6 (Unsafe tool sequencing): ReAct loop lets the LLM call write_file during early exploration before committing to a plan. This wastes tokens on premature edits, causes half-baked refactors, and breaks the "plan-then-build" discipline that production agents (Qoder, Trae Work) enforce via phase state machines.

KTD6 (brainstorm) defers the G5 integration decision to this plan. KTD7 locks G6 to extend PLAN_EXEC rather than introduce a new mode.

Requirements

Carried forward from the brainstorm, Wave 3 section:

R22: File reading supports symbol/function granularity sharding.
R23: Sharding capability exposed as a tool parameter (symbol="function_name"), backward compatible with full-file reads.
R24: ReAct loop enforces phase constraints — Planning phase only allows think/search; Building phase only allows write_file (and similar write tools).
R25: Phase state is configurable; extends ExecutionMode.PLAN_EXEC, does NOT introduce a new mode.

Cross-cutting (brainstorm):

R26: All optimizations configurable via agentkit.yaml (follow ServerConfig.from_dict pattern established in Waves 1-2).
R27: Each optimization ships a minimal self-check test (ponytail rule).

Acceptance examples relevant to Wave 3:

AE5 already covered by Wave 2 (G9) — not in Wave 3 scope.
No explicit AE for G5/G6 in the brainstorm — the plan below specifies test scenarios as the acceptance contract.

Key Technical Decisions

KTD1: G5 uses Python `ast` + language-aware regex, NOT tree-sitter

Decision: Implement symbol extraction with the Python stdlib ast module for Python files, and a small regex-based extractor for TypeScript/JavaScript/Go/Rust/Java. No new dependency.

Rationale:

tree-sitter requires native compilation + per-language grammar files (~30MB installed) — violates the ponytail rule "no new dependency if it can be avoided" and AGENTS.md "禁止使用 any 类型" → prefers minimal stack.
ast is stdlib, always available, parses Python accurately.
Regex extractor covers 80% case for TS/JS/Go/Rust/Java (function/class/struct declarations); falls back to "no symbols found → read full file" gracefully.
If a future Wave 4 needs more accurate multi-language parsing, tree-sitter can replace the regex layer behind the same SymbolExtractor interface.

Upgrade path: replace RegexSymbolExtractor with TreeSitterSymbolExtractor implementing the same SymbolExtractor protocol; no caller changes.

KTD2: G5 adds a new `ReadFileTool`, does not extend `ShellTool`

Decision: Add a dedicated ReadFileTool in src/agentkit/tools/file_read.py with path + optional symbol + optional start_line/end_line parameters.

Rationale:

ShellTool is for shell execution; grafting symbol extraction onto it muddies the contract.
A dedicated tool gives the LLM a clear schema ({"path": "...", "symbol": "function_name"}) and a focused system-prompt description.
Aligns with the existing _DEFAULT_CORE_TOOLS list in core/react.py:148 which already references read_file — the name is reserved but the implementation is missing.

KTD3: G6 phase state machine lives in ReActEngine, not in the skill config

Decision: Phase state (Planning/Building/Verification/Delivery) is tracked as a mutable field on ReActEngine instance. Transitions are driven by LLM-detected phase-completion signals (e.g., the LLM emits Phase: Building in its thinking) OR by an explicit advance_phase tool.

Rationale:

Skill config declares the policy (which tools per phase, auto-advance vs manual); the engine enforces it per-step. This matches R24 ("ReAct 循环加阶段约束").
Alternative considered: phase state in the agent instance (not engine). Rejected because ReActEngine already owns max_steps/verification_enabled etc.; phase state belongs with the loop that enforces it.

KTD4: PLAN_EXEC mode is wired at chat.py WebSocket path (REST already has fallback chain from Wave 2)

Decision: chat.py:1084 (currently warns "not yet supported, falling back to REACT") will route ExecutionMode.PLAN_EXEC to a new _execute_plan_exec_ws handler that constructs PhasePolicy from ServerConfig.plan_exec and passes it to ReActEngine.execute.

Rationale:

REST send_message already uses the Wave 2 three-tier fallback chain; PLAN_EXEC at REST would also need that wrapper. Out of scope for Wave 3 — only WebSocket path is wired. REST PLAN_EXEC remains "not yet supported" and explicitly raises if invoked.
Single integration point keeps Wave 3 scope bounded; REST wiring is a one-line follow-up once WebSocket path is proven.

KTD5: Default phase whitelist matches brainstorm R24

Decision: Default whitelist:

Planning: search, tool_search, read_file, bash (read-only commands like git status, ls)
Building: write_file, bash (write commands), read_file, search
Verification: bash (test commands), read_file, search
Delivery: all tools (final synthesis)

Rationale:

R24 explicitly names think/search for Planning and write_file for Building.
bash is split: read-only in Planning, full in Building. Enforced by adding a bash_command_filter callback (regex-based, blocks rm/mv/>/>> in Planning/Verification).
Delivery allows all tools to support last-mile formatting/cleanup.

KTD6: Phase transitions are LLM-driven via `advance_phase` tool (opt-in auto-advance)

Decision: Add an AdvancePhaseTool that the LLM can call to transition Planning→Building→Verification→Delivery. Auto-advance (after N steps in current phase) is opt-in via plan_exec.auto_advance_after_steps.

Rationale:

LLM-driven transitions match Qoder/Trae Work pattern: LLM declares "planning done" explicitly.
Auto-advance is a safety net for LLMs that forget to call advance_phase; default off (ponytail: less code is better).

Scope Boundaries

In Scope

New ReadFileTool with symbol parameter (G5, R22/R23).
SymbolExtractor protocol + AstSymbolExtractor (Python) + RegexSymbolExtractor (TS/JS/Go/Rust/Java).
PhasePolicy dataclass + PhaseState enum + per-step tool whitelist enforcement in ReActEngine.execute (G6, R24/R25).
AdvancePhaseTool for LLM-driven phase transitions.
WebSocket chat path routes PLAN_EXEC to new _execute_plan_exec_ws handler (KTD4).
plan_exec config section in agentkit.yaml + ServerConfig.from_dict extension (R26).
Tests for each new module (R27).

Out of Scope (Deferred to Follow-Up Work)

REST send_message PLAN_EXEC wiring — once WebSocket path is proven, REST wiring is a follow-up commit.
tree-sitter integration for more accurate multi-language parsing (KTD1 upgrade path).
Phase-aware prompt engineering (per-phase system prompt templates) — current plan keeps a single system prompt; phase-specific guidance is a prompt-engineering concern, not a code change.
Phase persistence across session resume (U7 checkpoint already saves plan state; phase state restoration is a separate concern).
Phase rollback on Building → Planning regression (Wave 2 G9 rollback handles file-level rollback; phase regression is a UX/prompt concern).
Tool-filter UI in the frontend (Wave 3 ships backend-only; frontend surfaces phase via existing event channel if needed in a follow-up).

Outside This Product's Identity

Replacing the existing ReAct loop with LangGraph (inherited from brainstorm).
Disc-based file system à la DeerFlow (inherited).
Docker sandbox (inherited; only command-level safety via bash_command_filter).

High-Level Technical Design

flowchart LR
    subgraph G5[Function Sharding]
        RF[ReadFileTool] --> SE[SymbolExtractor protocol]
        SE --> AST[AstSymbolExtractor<br/>Python stdlib ast]
        SE --> RX[RegexSymbolExtractor<br/>TS/JS/Go/Rust/Java]
    end

    subgraph G6[Phase State Machine]
        PP[PhasePolicy config] --> PS[PhaseState enum<br/>Planning/Building/Verify/Delivery]
        PS --> Filt[Tool filter per step]
        Filt --> RE[ReActEngine.execute]
        AP[AdvancePhaseTool] -->|transitions| PS
    end

    RF -->|file content for symbol| RE
    RE -->|enforces| Filt

The two subsystems compose at the ReAct engine boundary: ReadFileTool is one of the tools the LLM can call during any phase (filtered by PhasePolicy); PhaseState is enforced at the tool-call step before dispatch.

Implementation Units

U1. SymbolExtractor + ReadFileTool (G5)

Goal: Add ReadFileTool with optional symbol parameter; implement SymbolExtractor protocol with AstSymbolExtractor (Python) and RegexSymbolExtractor (TS/JS/Go/Rust/Java).

Requirements: R22, R23, R27.

Dependencies: none.

Files:

src/agentkit/tools/file_read.py (new)
src/agentkit/tools/symbol_extractor.py (new)
src/agentkit/tools/__init__.py (modify — register ReadFileTool)
tests/unit/test_symbol_extractor.py (new)
tests/unit/test_read_file_tool.py (new)

Approach:

SymbolExtractor is a Protocol with one method: extract_symbols(content: str, language: str) -> list[SymbolSpan]. SymbolSpan carries name, kind (function/class/method/struct), start_line, end_line.
AstSymbolExtractor walks ast.parse(content); for each FunctionDef/AsyncFunctionDef/ClassDef collects name + line range. Uses ast.get_source_segment style (line-based, not node-based, to keep the API simple).
RegexSymbolExtractor ships patterns for TS/JS (function X, const X = (...) =>, class X), Go (func X), Rust (fn X, struct X, impl X), Java (public ... X(...)). Falls back to "no symbols" if no pattern matches.
ReadFileTool.execute(path, symbol=None, start_line=None, end_line=None):
- symbol=None → read full file (backward compat with the existing _FakeTool benchmark shape).
- symbol="foo" → detect language from extension; call extract_symbols; return the line range of the first matching symbol; if no match, return an error result with available symbol names listed (so the LLM can retry).
- start_line/end_line overrides symbol; allows manual slicing.
Tool registered as read_file (matches the reserved name in core/react.py:148).

Execution note: characterization-first — write a test that asserts the tool returns the full file content when symbol=None (matches pre-existing benchmark _FakeTool shape) before adding symbol-extraction behavior.

Patterns to follow:

src/agentkit/tools/document_tool.py for tool structure (dataclass, Tool base class, input_schema).
src/agentkit/tools/schema_tools.py:SchemaExtractTool for "extract-from-source" pattern.

Test scenarios (covers R22, R23):

Happy paths:
- Python file, symbol="MyClass" → returns class body only (lines from class MyClass: through end of class).
- Python file, symbol="my_func" → returns function body only.
- TypeScript file, symbol="renderComponent" → returns arrow/function body.
- Go file, symbol="HandleRequest" → returns func body.
Edge cases:
- symbol=None → returns full file content (characterization).
- symbol="nonexistent" → returns error result listing available symbols ("Available symbols: foo, bar, baz").
- Unsupported file extension (.md, .txt) → returns full file with note: symbol extraction not supported for .md.
- Empty file → returns empty content.
- File with nested classes → outer class symbol returns including inner class.
Error paths:
- Path does not exist → raises FileNotFoundError (or returns error result matching other tools' convention).
- Path is a directory → returns error result.
- Permission denied → returns error result.
Integration scenarios:
- Symbol extraction + line slicing: symbol="foo", end_line=50 truncates at line 50 even if symbol extends further.
- Round-trip: extract symbol, write back via ShellTool sed (not in scope for tool — just verify extracted range is well-formed).

Verification:

python3 -m pytest tests/unit/test_symbol_extractor.py tests/unit/test_read_file_tool.py -q passes.
ruff check src/agentkit/tools/file_read.py src/agentkit/tools/symbol_extractor.py clean.
ReadFileTool appears in ToolRegistry.list_tools() after registration.

U2. PhasePolicy + PhaseState + ServerConfig (G6 core)

Goal: Add PhasePolicy dataclass, PhaseState enum, default whitelist config. Extend ServerConfig.from_dict with plan_exec section. Wire config to agentkit.yaml.

Requirements: R25, R26.

Dependencies: none.

Files:

src/agentkit/core/phase.py (new) — PhaseState enum, PhasePolicy dataclass, default_policy() factory.
src/agentkit/server/config.py (modify — add plan_exec field + from_dict parsing).
agentkit.yaml (modify — document plan_exec: section).
tests/unit/test_phase_policy.py (new).

Approach:

PhaseState = enum("planning building verification delivery").
PhasePolicy carries:
- whitelist: dict[PhaseState, set[str]] — tool names allowed per phase.
- bash_command_filter: dict[PhaseState, re.Pattern | None] — regex that bash args must NOT match (e.g., r"\b(rm|mv|>|>>)\b" in Planning).
- auto_advance_after_steps: int | None — None = manual (LLM calls advance_phase); int = auto-advance after N steps.
- start_phase: PhaseState = PhaseState.PLANNING.
default_policy() returns the KTD5 whitelist above.
ServerConfig.from_dict reads plan_exec section: enabled, whitelist_override (dict), auto_advance_after_steps.
agentkit.yaml gains a commented-out plan_exec: block (commented to preserve default behavior — opt-in).

Patterns to follow:

src/agentkit/core/fallback.py for dataclass + classmethod factory pattern.
src/agentkit/server/config.py from_dict extension template (established in Wave 1 for prompt_cache/streaming/verification; Wave 2 added rollback/fallback_chain; Wave 3 adds plan_exec).

Test scenarios (covers R25, R26):

Happy paths:
- default_policy() returns policy with all four phases; Planning whitelist contains search, read_file; Building contains write_file.
- PhasePolicy.is_tool_allowed("search", PhaseState.PLANNING) returns True.
- PhasePolicy.is_tool_allowed("write_file", PhaseState.PLANNING) returns False.
- PhasePolicy.is_tool_allowed("write_file", PhaseState.BUILDING) returns True.
Edge cases:
- Empty whitelist for a phase → all tools rejected (raises ValueError at construction time — fail-fast).
- Delivery phase whitelist contains "*" (wildcard) → all tools allowed.
- Custom whitelist override merges with default (override wins on conflict).
Error paths:
- Invalid phase name in config → ValueError with message naming the bad value.
- bash_command_filter regex compile failure → ValueError.
Config integration:
- ServerConfig.from_dict({"plan_exec": {"enabled": True, "auto_advance_after_steps": 5}}) populates fields correctly.
- ServerConfig.from_dict({}) → plan_exec = {} (default).

Verification:

python3 -m pytest tests/unit/test_phase_policy.py -q passes.
ruff check src/agentkit/core/phase.py clean.

U3. AdvancePhaseTool + ReActEngine phase enforcement (G6 wiring)

Goal: Add AdvancePhaseTool. Wire PhasePolicy into ReActEngine.execute so each tool-call step checks is_tool_allowed(tool_name, current_phase) before dispatch; blocked calls return a structured error to the LLM ("Tool 'write_file' not allowed in Planning phase — call advance_phase first").

Requirements: R24.

Dependencies: U2.

Files:

src/agentkit/tools/advance_phase.py (new) — AdvancePhaseTool calls react_engine.advance_phase().
src/agentkit/core/react.py (modify — add phase_policy param to __init__ + execute; add _current_phase field; add advance_phase() method; enforce in _execute_loop).
tests/unit/test_react_phase_enforcement.py (new).

Approach:

ReActEngine.__init__ accepts phase_policy: PhasePolicy | None = None. None = no enforcement (backward compat — all existing callers unaffected).
_current_phase: PhaseState | None initialized from phase_policy.start_phase if policy set, else None.
advance_phase() advances _current_phase to next enum value; raises ValueError if already at DELIVERY.

In _execute_loop, before dispatching a tool call:

if self._phase_policy is not None and self._current_phase is not None:
    if not self._phase_policy.is_tool_allowed(tool_name, self._current_phase):
        # Inject structured error into conversation, do NOT dispatch tool.
        # This counts as a "step" for max_steps purposes.
        observation = {
            "error": "phase_violation",
            "message": f"Tool '{tool_name}' not allowed in {self._current_phase.value} phase",
            "current_phase": self._current_phase.value,
            "hint": "Call advance_phase to move to Building phase"
        }
        continue  # next loop iteration

Auto-advance: if phase_policy.auto_advance_after_steps is set and _steps_in_phase >= auto_advance_after_steps, call advance_phase() automatically.
AdvancePhaseTool.execute() calls the bound engine's advance_phase() and returns the new phase name. Registered only when phase_policy is not None.

Execution note: characterization-first — test that ReActEngine with phase_policy=None behaves identically to pre-change (no enforcement, no advance_phase tool, no _current_phase mutation). Then add enforcement tests.

Patterns to follow:

src/agentkit/core/react.py verification_enabled pattern (feature flag + step-level check).
src/agentkit/tools/ask_human.py for tool that interacts with engine state.

Test scenarios (covers R24):

Characterization (no policy):
- ReActEngine(phase_policy=None) — all tools allowed in all steps; no advance_phase tool registered; behavior matches pre-change.
Happy paths:
- Planning phase: LLM calls search → executes; LLM calls advance_phase → phase becomes Building.
- Building phase: LLM calls write_file → executes; LLM calls advance_phase → phase becomes Verification.
- Verification phase: LLM calls bash with pytest → executes; LLM calls advance_phase → phase becomes Delivery.
- Delivery phase: LLM calls any tool → executes (wildcard).
Edge cases:
- advance_phase called at Delivery → returns error "Already at final phase".
- Auto-advance after 3 steps in Planning → phase transitions automatically on 4th step.
- bash command in Planning contains rm file → blocked by bash_command_filter.
Error paths:
- LLM calls write_file in Planning → tool NOT dispatched; structured error returned to LLM; loop continues.
- LLM calls non-existent tool → existing error path (not phase-related).
Integration scenarios:
- Phase transition emits a phase_changed event (use existing _broadcast_event pattern from experts/orchestrator.py).
- max_steps reached mid-phase → ReActResult.status = "max_steps_reached" (existing path, no change).

Verification:

python3 -m pytest tests/unit/test_react_phase_enforcement.py -q passes.
Existing tests/unit/test_react_engine.py still passes (characterization — no policy = no change).
ruff check src/agentkit/core/react.py src/agentkit/tools/advance_phase.py clean.

U4. Wire PLAN_EXEC at chat.py WebSocket path (G6 chat integration)

Goal: Replace the chat.py:1084 "not yet supported, falling back to REACT" warning with a real PLAN_EXEC handler that constructs PhasePolicy from ServerConfig.plan_exec and dispatches to ReActEngine.execute with the policy set.

Requirements: R24, R25 (end-to-end wiring).

Dependencies: U2, U3.

Files:

src/agentkit/server/routes/chat.py (modify — add _execute_plan_exec_ws handler; branch on ExecutionMode.PLAN_EXEC).
tests/unit/test_chat_plan_exec_ws.py (new).

Approach:

New helper _execute_plan_exec_ws(websocket, agent, routing, messages, ...):
1. Read server_config.plan_exec (may be {} if not configured → use default_policy()).
2. Build PhasePolicy from config (apply overrides).
3. Construct ReActEngine(..., phase_policy=policy).
4. Register AdvancePhaseTool bound to this engine.
5. Call engine.execute_stream(...) — reuses existing streaming path.
6. Emit phase_changed events through the WebSocket (frontend can render phase indicator).

chat.py:1084 changes from if execution_mode not in (REACT, SKILL_REACT): warn + fall back to:

if routing.execution_mode == ExecutionMode.PLAN_EXEC:
    await _execute_plan_exec_ws(websocket, agent, routing, ...)
    return
if routing.execution_mode not in (ExecutionMode.REACT, ExecutionMode.SKILL_REACT):
    # existing warning for REWOO/REFLEXION/TEAM_COLLAB
    ...

REST send_message path: explicitly raise HTTPException(501, "PLAN_EXEC via REST not yet supported; use WebSocket") — Wave 3 does NOT wire REST (KTD4).

Execution note: characterization-first — test that existing REWOO/REFLEXION/TEAM_COLLAB modes still fall back to REACT with the warning (no regression). Then add PLAN_EXEC wiring.

Patterns to follow:

src/agentkit/server/routes/chat.py existing WebSocket handler structure (lines 1082-1100).
src/agentkit/server/_fallback_chain.py (Wave 2 U3) for "construct engine per-request with config" pattern.

Test scenarios (covers end-to-end):

Characterization:
- ExecutionMode.REWOO via WebSocket → still falls back to REACT with warning (existing behavior unchanged).
- ExecutionMode.REFLEXION → same.
- ExecutionMode.TEAM_COLLAB → same.
Happy paths:
- ExecutionMode.PLAN_EXEC via WebSocket → _execute_plan_exec_ws invoked; ReActEngine constructed with phase_policy; AdvancePhaseTool registered.
- Planning phase: LLM emits search tool call → executed; tool result streamed.
- LLM emits advance_phase → phase_changed event sent to WebSocket client; subsequent write_file call now allowed.
Edge cases:
- plan_exec config absent → default_policy() used; behavior matches KTD5 whitelist.
- plan_exec.enabled=False → falls back to REACT (opt-out).
- Phase violation: LLM calls write_file in Planning → structured error returned; loop continues; phase_violation event emitted.
Error paths:
- REST send_message with PLAN_EXEC → 501 error.
- Phase policy construction fails (bad config) → 500 error with message.
Integration scenarios:
- Existing fallback chain (Wave 2 U3) NOT applied to PLAN_EXEC — phase policy and fallback chain are mutually exclusive (KTD5 from Wave 2 plan: chain only wraps REACT/SKILL_REACT at REST). Document this in chat.py comment.

Verification:

python3 -m pytest tests/unit/test_chat_plan_exec_ws.py -q passes.
ruff check src/agentkit/server/routes/chat.py clean.
Manual test: agentkit chat with @skill:plan_exec_demo skill config → WebSocket stream includes phase_changed events.

Risks & Dependencies

Risks

ReAct core modification risk (high): U3 modifies ReActEngine._execute_loop. Mitigation: characterization-first tests (U3 Execution note); phase_policy=None default preserves all existing behavior; full test_react_engine.py regression.
Symbol extraction accuracy (medium): Regex extractor may miss edge cases (decorated functions, nested generics, multi-line signatures). Mitigation: fall back to "no symbols found → read full file" gracefully; never raise on extraction failure.
PLAN_EXEC phase deadlock (medium): LLM may never call advance_phase, leaving the agent stuck in Planning. Mitigation: auto_advance_after_steps config (default 5); timeout via existing max_steps.
Tool name drift (low): Phase whitelist references tool names (write_file, search, etc.) that may be renamed in future. Mitigation: whitelist is config-driven; rename only requires config update.

Dependencies

Wave 2 PR #5 (feat/agent-wave2-medium-coupling) should be merged first — Wave 3 builds on the ServerConfig.from_dict extension pattern and the _fallback_chain.py integration shape established there. If PR #5 is still open, Wave 3 branches from feat/agent-wave2-medium-coupling rather than main.
No external library dependencies (KTD1).

System-Wide Impact

Agents using PLAN_EXEC mode: gain phase enforcement. Existing REACT/SKILL_REACT/DIRECT_CHAT agents: zero change (phase_policy defaults to None).
Tool registry: gains two new tools (read_file, advance_phase). Frontend tool list display may need updating to show the new icons — out of scope for Wave 3 (frontend follows up).
agentkit.yaml: gains plan_exec: section (commented by default). Existing configs unaffected.
WebSocket clients: gain phase_changed event type. Existing clients ignore unknown event types (verified in Wave 2 — phase_rollback_* events follow the same pattern).

Sources & Research

Origin brainstorm: docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md (Wave 3 section, KTD6/KTD7).
Wave 1 plan: docs/plans/2026-06-29-002-feat-agent-wave1-quick-wins-plan.md (PR #4 merged).
Wave 2 plan: docs/plans/2026-06-29-003-feat-agent-wave2-medium-coupling-plan.md (PR #5 open).
Trae Work architecture research (cited in brainstorm): SOLO four-stage state machine pattern.
Qoder architecture research (cited in brainstorm): Spec→Coding→Verify closed loop.
Codebase: src/agentkit/core/react.py:148 reserves read_file/write_file tool names in _DEFAULT_CORE_TOOLS — Wave 3 U1 delivers the missing read_file implementation.
Codebase: src/agentkit/server/routes/chat.py:1084 documents that PLAN_EXEC is "not yet supported" — Wave 3 U4 closes this gap.

Deferred to Implementation

Exact regex patterns for non-Python symbol extraction (U1) — design above gives the shape; implementer finalizes patterns based on real-world test fixtures.
bash_command_filter regex precision (U2) — defaults block rm/mv/>/>>; implementer may add more based on test scenarios.
phase_changed event payload shape (U3/U4) — minimal viable shape: {"phase": "building", "previous": "planning"}; frontend rendering concerns are out of scope.
Whether AdvancePhaseTool accepts a target_phase argument for skipping phases (e.g., Planning → Verification) — default no (sequential only); add if test scenarios reveal a need.

28 KiB Raw Blame History

Wave 3 Strategic Coupling — G5 Function-level Sharding + G6 SOLO Phase Constraints

Summary

Problem Frame

Requirements

Key Technical Decisions

KTD1: G5 uses Python ast + language-aware regex, NOT tree-sitter

KTD2: G5 adds a new ReadFileTool, does not extend ShellTool

KTD3: G6 phase state machine lives in ReActEngine, not in the skill config

KTD4: PLAN_EXEC mode is wired at chat.py WebSocket path (REST already has fallback chain from Wave 2)

KTD5: Default phase whitelist matches brainstorm R24

KTD6: Phase transitions are LLM-driven via advance_phase tool (opt-in auto-advance)

Scope Boundaries

In Scope

Out of Scope (Deferred to Follow-Up Work)

Outside This Product's Identity

High-Level Technical Design

Implementation Units

U1. SymbolExtractor + ReadFileTool (G5)

U2. PhasePolicy + PhaseState + ServerConfig (G6 core)

U3. AdvancePhaseTool + ReActEngine phase enforcement (G6 wiring)

U4. Wire PLAN_EXEC at chat.py WebSocket path (G6 chat integration)

Risks & Dependencies

Risks

Dependencies

System-Wide Impact

Sources & Research

Deferred to Implementation

28 KiB

Raw Blame History

KTD1: G5 uses Python `ast` + language-aware regex, NOT tree-sitter

KTD2: G5 adds a new `ReadFileTool`, does not extend `ShellTool`

KTD6: Phase transitions are LLM-driven via `advance_phase` tool (opt-in auto-advance)