fischer-agentkit/docs/plans/2026-06-29-003-feat-agent-w...

482 lines
31 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
date: 2026-06-29
type: feat
origin: docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md
---
# Wave 2 — Auxiliary LLM Routing, Three-Tier Fallback, Atomic Subtask Rollback
## Summary
Wave 2 of the advanced-agent gap optimization brainstorm. Three medium-coupling gaps: G4 routes summary calls through a cheaper auxiliary LLM (falling back to main on failure), G7 introduces a three-tier fallback chain (main → Recovery via existing `ReflexionEngine` → Emergency with structured errors), G9 binds atomic subtask rollback to `PlanPhase` (opt-in via `rollback_command`, coordinated with the existing U7 `PipelineCheckpoint`).
---
## Problem Frame
Wave 1 shipped self-contained quick wins (G1/G2/G3/G8). Wave 2 addresses the medium-coupling gaps that touch multiple layers and resolve two deferred-to-planning decisions surfaced in the brainstorm: G7 Emergency layer rule template shape, and G9 `rollback_command` default behavior. The constraints these gaps must respect come from `docs/solutions/logic-errors/long-horizon-reliability-code-review-fixes.md` — new fields must preserve existing contracts, dynamic plan mutations must persist immediately, and `PipelineCheckpoint` is in-memory dict + Redis fallback (not a DB row lock).
---
## Requirements
Carried from `docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md` § Wave 2 (R13-R21):
- R13. `ContextCompressor` summary task routes to auxiliary model (cheap, e.g. Gemini Flash / Doubao lite), not main model.
- R14. `auxiliary_model` is configurable, separated from main `model`.
- R15. Summary quality does not degrade: auxiliary failure falls back to main model (not to simple_summary).
- R16. Main agent failure triggers Recovery layer (reuse `ReflexionEngine` Evaluate→Reflect→Retry).
- R17. Recovery failure triggers Emergency layer (rule-based fallback, structured error + suggestion).
- R18. Fallback chain is configurable (per-layer max retries, enable/disable Recovery/Emergency).
- R19. `PlanPhase` (`experts/plan.py:147`) gains optional `validation_command` and `rollback_command` fields.
- R20. Phase failure auto-executes `rollback_command` when configured (default `git checkout` pattern); coordinated with U7 checkpoint.
- R21. `checkpoint.save` runs only after rollback validation passes (avoid persisting failed state).
Cross-cutting: R26 (configuration via `agentkit.yaml`), R27 (each gap ships a minimal self-check test, per ponytail rule).
---
## Key Technical Decisions
KTD1. **Auxiliary routing is compressor-scoped, not gateway-level.** `LLMGateway` already has a per-model fallback chain (`gateway.py:170-227`), but it semantics are "main fails → try backup model", not "route by task type". G4 adds an `auxiliary_model` parameter to `ContextCompressor.__init__`; `_summarize` tries auxiliary first, falls back to main on failure. This avoids touching the gateway's existing fallback semantics and keeps the change to one file plus config wiring.
KTD2. **Emergency layer adds a parallel `error_struct` field, not replacing `error_message`.** `TaskResult.error_message: str | None` is serialized to API responses (`protocol.py:132-145`) — changing its type breaks frontend contracts. New `error_struct: dict | None` field carries `{error_code, message, suggestions, retryable}`. `error_message` remains the human-readable string (now derived from `error_struct.message` when present). Backward compatibility preserved.
KTD3. **Emergency rule classification by exception type, not by status field.** `TaskTimeoutError``timeout`, `LoopDetectedError``loop_detected`, `LLMProviderError``llm_failure`, `TaskCancelledError``cancelled` (not Emergency-eligible, propagates), generic `Exception``internal_error`. Each rule maps to a fixed message + suggestion list (no LLM-driven suggestions — pure rule-based, per brainstorm).
KTD4. **Recovery layer reuses `ReflexionEngine` with main model, not auxiliary.** `ReflexionEngine.execute()` already supports `evaluate_model`/`reflect_model` parameters (`reflexion.py:94-111`), defaulting to the main model. Recovery triggers on main agent failure; using the main model for evaluate/reflect maximizes diagnostic reliability (the same model that failed is best positioned to reflect on its own failure). Auxiliary model is reserved for G4's cost-sensitive compression path.
KTD5. **Three-tier chain wires at `chat.py:613` (WebSocket main path), not at `ReActEngine.execute()`.** ReAct has 5 call sites (`chat.py:613`, `rewoo.py:1204`, `cli/chat.py:338`, `reflexion.py:216`, `config_driven.py:695`). Wrapping at ReAct would force all 5 sites through Recovery/Emergency, including ReflexionEngine itself (creating a recursive loop). Wrapping at `chat.py:613` covers the primary user-facing path; CLI and ReWOO are out of scope (deferred to follow-up).
KTD6. **Rollback is opt-in via `rollback_command` field; no implicit default.** Brainstorm KTD5 mentioned "default `git checkout`" but auto-executing `git checkout` on every phase failure changes existing behavior (Finding 1: "新字段默认值须保持既有契约"). Resolution: `rollback_command: str | None = None`. When unset, no rollback executes (preserves existing contract). When set, the configured command runs after validation failure, before checkpoint save. `git checkout <files>` is the canonical pattern documented in YAML examples.
KTD7. **Rollback executes via `RollbackExecutor`, not via `ShellTool`.** `ShellTool._is_dangerous` would trigger `confirm_callback` for `git checkout` (not in `_SAFE_COMMAND_PREFIXES`, not in `_DANGEROUS_BINARY_FLAGS`). Orchestrator-internal execution bypasses ShellTool — uses `asyncio.create_subprocess_shell` with `cwd`/`timeout`/`proc.kill()` pattern from `verification_loop.py:67-103`. Audit logging added (not a whitelist concern; terminal_whitelist only governs `/api/v1/terminal/server` route per `terminal_server.py:227`).
KTD8. **G9 checkpoint ordering: validation → rollback → checkpoint save.** Currently `orchestrator.py:265` saves checkpoint unconditionally after phase finalizes (success or failure). R21 requires checkpoint save only after rollback validation passes. New ordering: phase fails → mark FAILED → mark dependents FAILED → execute `validation_command` (if set) → if validation fails, execute `rollback_command` (if set) → save checkpoint (only if rollback validation passed or no rollback configured).
---
## High-Level Technical Design
### Three-tier fallback chain (G7)
```mermaid
stateDiagram-v2
[*] --> Main: chat request
Main --> MainSuccess: success
Main --> Recovery: failure (exception or empty_fallback)
Recovery --> RecoverySuccess: ReflexionEngine retry succeeds
Recovery --> Emergency: max_reflections exhausted
Emergency --> [*]: emit error_struct, terminal
MainSuccess --> [*]: normal response
RecoverySuccess --> [*]: recovered response
```
### G9 phase failure + rollback sequence
```mermaid
sequenceDiagram
participant O as TeamOrchestrator
participant P as PlanPhase
participant RE as RollbackExecutor
participant CP as PipelineCheckpoint
O->>P: execute phase
P-->>O: failure (exception)
O->>O: mark FAILED + dependents FAILED
alt validation_command set
O->>RE: run validation_command
RE-->>O: ValidationResult
alt validation fails AND rollback_command set
O->>RE: run rollback_command
RE-->>O: RollbackResult
alt rollback validation passes
O->>CP: save checkpoint
else rollback validation fails
O->>O: log rollback_failed, skip checkpoint
end
else no rollback_command
O->>CP: save checkpoint (no rollback)
end
else no validation_command
O->>CP: save checkpoint (no validation)
end
```
### Configuration shape (YAML)
```yaml
# G4: Auxiliary LLM for compression
llm:
auxiliary_model: fast # alias resolved via existing model_aliases mechanism
# G7: Three-tier fallback chain
fallback_chain:
enabled: true
recovery:
enabled: true
max_retries: 1 # ReflexionEngine max_reflections override
emergency:
enabled: true
# G9: Rollback configuration (per-phase opt-in via PlanPhase.rollback_command)
rollback:
default_timeout: 30.0 # RollbackExecutor subprocess timeout
```
---
## Implementation Units
### U1. G4 — Auxiliary LLM Routing in ContextCompressor
**Goal:** Route `_summarize` LLM calls through `auxiliary_model` when configured; fall back to main model on auxiliary failure (not to `_simple_summary`).
**Requirements:** R13, R14, R15, R26, R27
**Dependencies:** none (self-contained)
**Files:**
- Modify: `src/agentkit/core/compressor.py` (add `auxiliary_model` param + routing in `_summarize`)
- Modify: `src/agentkit/llm/config.py` (add `auxiliary_model: str | None` field to `LLMConfig`)
- Modify: `src/agentkit/server/config.py` (`_build_llm_config` reads `auxiliary_model` from llm section)
- Modify: `agentkit.yaml` (document `llm.auxiliary_model: fast` in llm section)
- Create: `tests/unit/test_compressor_auxiliary.py`
**Approach:** `ContextCompressor.__init__` gains `auxiliary_model: str | None = None` param (after existing `model` param). `_summarize` (compressor.py:123-158) restructuring:
1. If `auxiliary_model` is set and differs from `model`, try `auxiliary_model` first via `self._llm_gateway.chat(model=self._auxiliary_model, ...)`.
2. **Truthiness check (Finding 4 anti-pattern):** treat empty `response.content` (None or whitespace-only) as failure, not success. On failure, log and fall through to main model.
3. If auxiliary succeeds with non-empty content, return it.
4. If auxiliary fails (exception OR empty content), retry with main `self._model`.
5. Existing `except Exception → _simple_summary` block remains as the final degradation tier.
`LLMConfig.auxiliary_model` reads from `data.get("auxiliary_model")` in `_build_llm_config`. Agentkit.yaml already declares `fast: bailian-coding/qwen-turbo` alias — this is the canonical auxiliary target.
**Execution note:** characterization-first. Capture current `_summarize` behavior (single model, exception → simple_summary) with `auxiliary_model=None` tests, then add `auxiliary_model="fast"` tests for new routing behavior.
**Patterns to follow:**
- `LLMGateway._get_models_to_try` (`gateway.py:170-227`) — existing fallback chain pattern
- Wave 1's `ServerConfig.from_dict` extension template (prompt_cache/streaming/verification sections, `config.py:239-273`)
**Test scenarios:**
- Happy path: `auxiliary_model="fast"` set, auxiliary returns non-empty content → result is auxiliary content, main model not called
- Empty content fallback: auxiliary returns `content=""` (or `None`) → main model called, main content returned (covers Finding 4 anti-pattern)
- Auxiliary exception: auxiliary raises `LLMProviderError` → main model called, main content returned
- Both fail: auxiliary raises, main raises → `_simple_summary` returned (existing degradation preserved)
- Characterization: `auxiliary_model=None` → behavior matches current code (single model call)
- Config wiring: `LLMConfig.from_dict` reads `auxiliary_model` field from dict; `ServerConfig._build_llm_config` passes it through
- Audit: auxiliary call uses `agent_name="compressor"`, `task_type="summarization"` (preserved for usage tracking)
**Verification:** All test scenarios pass; existing `tests/unit/test_compressor*.py` (if any) still pass with `auxiliary_model=None` default; ruff clean.
---
### U2. G7 — Emergency Layer Rule Template + TaskResult Extension
**Goal:** Add rule-based Emergency classifier with structured error output. No wiring yet — just the infrastructure (classifier + data structure + `fallback.py` extension).
**Requirements:** R17, R18, R26, R27
**Dependencies:** none (foundation for U3)
**Files:**
- Modify: `src/agentkit/core/fallback.py` (add `EmergencyRules` class + `EmergencyError` dataclass, preserve existing 3 constants)
- Modify: `src/agentkit/core/protocol.py` (add `error_struct: dict | None = None` field to `TaskResult`, update `to_dict`)
- Create: `tests/unit/test_emergency_rules.py`
**Approach:**
`EmergencyError` dataclass (in `fallback.py`):
```
@dataclass
class EmergencyError:
error_code: str # "timeout"|"loop_detected"|"llm_failure"|"internal_error"
message: str # human-readable Chinese message (mirrors EMPTY_LLM_RESPONSE style)
suggestions: list[str] # actionable user-facing suggestions
retryable: bool # whether user retry might succeed
original_error: str # str(exc) for traceability
def to_dict(self) -> dict: ...
def to_error_message(self) -> str: ... # formatted "message\n建议1) ... 2) ..."
```
`EmergencyRules` class (rule-based classifier, no LLM):
```
class EmergencyRules:
@staticmethod
def classify(exc: Exception, config: dict | None = None) -> EmergencyError:
# Match by exception type (TaskTimeoutError, LoopDetectedError, LLMProviderError, etc.)
# config allows per-rule customization (suggestion overrides, retryable overrides)
```
Rule mapping (initial set, expandable via config):
- `TaskTimeoutError``error_code="timeout"`, `retryable=True`, suggestions: ["稍后重试", "简化任务范围"]
- `LoopDetectedError``error_code="loop_detected"`, `retryable=True`, suggestions: ["拆分任务", "检查工具参数"]
- `LLMProviderError``error_code="llm_failure"`, `retryable=True`, suggestions: ["稍后重试", "切换模型"]
- `TaskCancelledError` → not classified (propagates as-is, Emergency not triggered)
- Generic `Exception``error_code="internal_error"`, `retryable=False`, suggestions: ["联系管理员"]
`TaskResult` extension:
```
@dataclass
class TaskResult:
# ... existing fields ...
error_message: str | None # unchanged
error_struct: dict | None = None # NEW: serialized EmergencyError.to_dict()
```
`to_dict` includes `error_struct` when set; `error_message` continues to hold the human-readable string.
**Patterns to follow:**
- Existing `EMPTY_LLM_RESPONSE` / `MAX_STEPS_REACHED` style in `fallback.py` (Chinese, "建议:..." format)
- `VerificationResult.errors: list[str]` field pattern from `verification_loop.py:18-24`
- `TaskResult.to_dict` pattern at `protocol.py:132-145`
**Test scenarios:**
- Happy path: `classify(TaskTimeoutError(...))` returns `EmergencyError(error_code="timeout", retryable=True)`
- Each exception type maps to correct `error_code`
- `TaskCancelledError` is NOT classified (caller must check before invoking `classify`)
- `to_dict()` produces all 5 fields
- `to_error_message()` formats suggestions as "建议1) ... 2) ..."
- `EmergencyRules.classify(Exception("unknown"))``error_code="internal_error"`, `retryable=False`
- Config override: custom suggestion list for `timeout` rule via config dict
- `TaskResult` with `error_struct` set: `to_dict()` includes both `error_message` and `error_struct`
- `TaskResult` with `error_struct=None` (default): `to_dict()` matches current behavior (backward compat)
**Verification:** All scenarios pass; `fallback.py` existing 3 constants unchanged; `TaskResult.to_dict` for tasks without `error_struct` matches pre-change output byte-for-byte.
---
### U3. G7 — Three-Tier Fallback Chain Wiring
**Goal:** Wire main → Recovery (ReflexionEngine) → Emergency (EmergencyRules) at `chat.py:613`. Composes U2's infrastructure with existing `ReflexionEngine`.
**Requirements:** R16, R18, R26
**Dependencies:** U2 (EmergencyRules + TaskResult.error_struct)
**Files:**
- Modify: `src/agentkit/server/routes/chat.py` (wrap main agent call at L613 with three-tier chain)
- Modify: `src/agentkit/server/config.py` (add `fallback_chain` section to `from_dict`)
- Modify: `agentkit.yaml` (document `fallback_chain:` section)
- Create: `tests/unit/test_fallback_chain.py`
**Approach:**
New helper module or inline function in `chat.py`:
```
async def execute_with_fallback_chain(
agent: ConfigDrivenAgent,
task: Task,
config: dict, # fallback_chain config section
llm_gateway: LLMGateway,
) -> TaskResult:
# Tier 1: Main
try:
result = await agent.execute(task)
if result.status == "success" or result.status == "completed":
return result
# Treat non-success as soft failure → trigger Recovery
raise AgentExecutionError(result.error_message or "main agent did not succeed")
except (TaskTimeoutError, LoopDetectedError, LLMProviderError, AgentExecutionError) as exc:
if not config.get("recovery", {}).get("enabled", True):
return _to_emergency(exc, task)
# Tier 2: Recovery (ReflexionEngine)
try:
reflexion_engine = ReflexionEngine(
llm_gateway=llm_gateway,
max_reflections=config.get("recovery", {}).get("max_retries", 1),
# ... other params from agent's existing reflexion config
)
recovery_result = await reflexion_engine.execute(
messages=task.messages, # rebuild from task
tools=agent.get_tools(),
model=task.model,
agent_name=agent.name,
task_id=task.task_id,
)
if recovery_result.status == "success":
return _recovery_to_task_result(recovery_result, task)
except Exception as recovery_exc:
logger.warning(f"Recovery layer failed: {recovery_exc}")
# Tier 3: Emergency
return _to_emergency(exc, task)
```
`_to_emergency(exc, task)` constructs `EmergencyError` via `EmergencyRules.classify(exc, config)`, then returns `TaskResult(status=FAILED, error_message=emergency.to_error_message(), error_struct=emergency.to_dict(), ...)`.
Wiring at `chat.py:613`: replace direct `agent.execute(task)` call with `execute_with_fallback_chain(...)`. Recovery config from `server_config.fallback_chain` (new `ServerConfig.fallback_chain: dict` field, mirroring Wave 1 pattern).
**Recovery layer scope (KTD5):** Only `chat.py:613` is wrapped. CLI (`cli/chat.py:338`), ReWOO (`rewoo.py:1204`), ReflexionEngine's internal ReAct call, and `config_driven.py:695` are NOT wrapped (would create recursive loop or unwanted coupling). These remain on the direct-execute path. Documented in `## Scope Boundaries`.
**Patterns to follow:**
- `config_driven.py:836` — existing `ReflexionEngine` instantiation pattern (constructor params, execute call)
- Wave 1's `verification` config section in `ServerConfig.from_dict` (`config.py:240-273`)
**Test scenarios:**
- Happy path: main agent succeeds → no Recovery, no Emergency triggered; `error_struct=None`
- Main fails (timeout) → Recovery triggered → Recovery succeeds → `error_struct=None`, output from ReflexionEngine
- Main fails (timeout) → Recovery triggered → Recovery fails (max_reflections exhausted) → Emergency triggered → `error_struct` populated with `error_code="timeout"`, `retryable=True`
- Main fails (LoopDetectedError) → Emergency `error_code="loop_detected"`
- Main fails (LLMProviderError) → Emergency `error_code="llm_failure"`
- Main fails (TaskCancelledError) → propagates as-is (NOT routed to Emergency)
- Main fails (generic Exception) → Emergency `error_code="internal_error"`, `retryable=False`
- Config: `fallback_chain.recovery.enabled=false` → skip Recovery, go directly to Emergency
- Config: `fallback_chain.emergency.enabled=false` → re-raise original exception (no Emergency)
- Integration: full chain on real `ConfigDrivenAgent` with mocked LLM (mock main raises, mock ReflexionEngine succeeds)
**Verification:** All scenarios pass; existing chat WebSocket tests still pass; ruff clean.
---
### U4. G9 — PlanPhase Rollback Fields + RollbackExecutor + TeamOrchestrator Integration
**Goal:** Add `validation_command`/`rollback_command` optional fields to `PlanPhase`; execute rollback on phase failure (opt-in); coordinate with U7 checkpoint ordering per R21.
**Requirements:** R19, R20, R21, R26, R27
**Dependencies:** none (uses existing U7 `PipelineCheckpoint` and `VerificationLoop` patterns)
**Files:**
- Modify: `src/agentkit/experts/plan.py` (`PlanPhase` dataclass + `to_dict`/`from_dict` symmetry)
- Modify: `src/agentkit/experts/orchestrator.py` (insert rollback execution between phase failure and checkpoint save)
- Create: `src/agentkit/orchestrator/rollback.py` (`RollbackExecutor` class)
- Modify: `src/agentkit/server/config.py` (add `rollback` section to `from_dict`)
- Modify: `agentkit.yaml` (document `rollback:` section)
- Create: `tests/unit/test_phase_rollback.py`
**Approach:**
**PlanPhase extension (`plan.py:147-233`):** Add two optional fields with `None` default (preserves existing contract per Finding 1):
```
validation_command: str | None = None
rollback_command: str | None = None
```
Update `to_dict()` to include both fields (only when not None, to keep dict shape minimal). Update `from_dict()` to read both fields.
**`RollbackExecutor` class (new file `orchestrator/rollback.py`):** Mirrors `VerificationLoop` pattern (`verification_loop.py:67-103`):
```
class RollbackExecutor:
def __init__(self, working_dir: str | None = None, timeout: float = 30.0): ...
async def execute(self, command: str) -> RollbackResult:
# asyncio.create_subprocess_shell with cwd/timeout/proc.kill()
# Returns RollbackResult(passed, exit_code, stdout, stderr, command)
async def validate(self, command: str) -> RollbackResult:
# Same as execute but returns passed=False on non-zero exit
```
`RollbackResult` dataclass: `{passed: bool, exit_code: int, stdout: str, stderr: str, command: str}`.
**TeamOrchestrator integration (`orchestrator.py:246-270`):** New ordering per KTD8:
```
# Existing: phase failure detected, mark FAILED + dependents FAILED
# (lines 247-261 unchanged)
# NEW: rollback phase (only if validation_command and rollback_command configured)
should_save_checkpoint = True
if ph.validation_command and ph.rollback_command:
validator = RollbackExecutor(working_dir=self._workspace_root, timeout=...)
validation_result = await validator.validate(ph.validation_command)
if not validation_result.passed:
rollback_result = await validator.execute(ph.rollback_command)
if not rollback_result.passed:
# Rollback failed → don't save checkpoint (R21)
should_save_checkpoint = False
logger.error(f"Rollback failed for phase {ph.id}: {rollback_result.stderr}")
# Emit phase_rollback_failed event
await self._broadcast_event("phase_rollback_failed", {...})
# Existing: checkpoint save (conditional now)
if should_save_checkpoint and self._checkpoint is not None:
try:
await self._checkpoint.save(plan.id, ph, plan.status.value)
except Exception as e:
logger.warning(...)
```
**Events emitted (new):** `phase_rollback_started`, `phase_rollback_completed`, `phase_rollback_failed`. Existing `phase_failed` event unchanged (emitted before rollback).
**Config:** `rollback.default_timeout: float = 30.0` in `ServerConfig.rollback` dict (Wave 1 pattern).
**Security boundary (KTD7):** Rollback subprocess does NOT go through `ShellTool` (avoids `confirm_callback` for `git checkout`). Audit log emitted via `_broadcast_event`. `terminal_whitelist` is not consulted (only governs `/api/v1/terminal/server` route, per `terminal_server.py:227`).
**Execution note:** characterization-first. Test current behavior (`rollback_command=None` → no rollback, checkpoint saved) before adding rollback behavior.
**Patterns to follow:**
- `VerificationLoop` subprocess execution pattern (`verification_loop.py:67-103`) — `asyncio.create_subprocess_shell` + `cwd` + `timeout` + `proc.kill()`
- `PlanPhase.to_dict`/`from_dict` symmetric serialization pattern at `plan.py:186-233`
- `TeamOrchestrator._execute_phase` event broadcasting pattern (`orchestrator.py:253-260`)
- Wave 1's `ServerConfig.from_dict` extension template
**Test scenarios:**
- Characterization: `PlanPhase()` with no `validation_command`/`rollback_command` → `to_dict()` output matches pre-change shape (no new keys); `from_dict({})` produces phase with both fields as None
- Serialization: phase with `rollback_command="git checkout foo.py"``to_dict()` includes key; `from_dict(to_dict())` round-trips
- RollbackExecutor happy path: `execute("git status")` returns `passed=True`, `exit_code=0`
- RollbackExecutor timeout: `execute("sleep 10", timeout=0.1)` returns `passed=False`, exit_code=-1 (or similar)
- RollbackExecutor failure: `execute("false")` returns `passed=False`, `exit_code=1`
- Integration — opt-in default: phase fails, `rollback_command=None` → no RollbackExecutor call, checkpoint saved (existing behavior)
- Integration — rollback configured, validation passes (returns 0): rollback NOT executed, checkpoint saved
- Integration — rollback configured, validation fails, rollback succeeds (returns 0): rollback executed, checkpoint saved
- Integration — rollback configured, validation fails, rollback fails (returns 1): rollback executed, checkpoint NOT saved (R21), `phase_rollback_failed` event emitted
- Integration — real git repo fixture: phase writes file via `git checkout`, rollback `git checkout foo.py` restores file; assert file content matches pre-phase state
**Verification:** All scenarios pass; existing `tests/unit/test_pipeline_state.py` and `tests/unit/test_team_orchestrator*.py` (if any) still pass; ruff clean.
---
## Scope Boundaries
### Deferred to Follow-Up Work
- **Recovery wiring at non-chat call sites** (`cli/chat.py:338`, `rewoo.py:1204`, `config_driven.py:695`). KTD5 limits Wave 2 to the primary chat WebSocket path. Other entry points can adopt the same wrapper in a follow-up.
- **Recovery layer streaming events.** Wave 2's chain returns final `TaskResult`; SSE events for "recovery_started"/"recovery_completed" would require deeper WebSocket protocol changes.
- **Patch-level rollback** (`git apply -R <patch>`). KTD6 + KTD7 scope Wave 2 to `git checkout <files>` pattern. Patch-level requires extending `CheckpointData` schema to track patches — Wave 3 candidate.
- **DB-backed atomic claim for checkpoint.** Finding 3 noted `PipelineCheckpoint` is in-memory dict + Redis fallback, not a PostgreSQL `FOR UPDATE SKIP LOCKED` pattern. Multi-process atomicity is out of scope; single-process `asyncio.Lock` (already present at `orchestrator.py:53`) is sufficient.
### Outside this product's identity (carried from brainstorm)
- Wave 3 (G5/G6) — tree-sitter integration, SOLO four-stage state machine. Strategic direction locked in brainstorm KTD6/KTD7; implementation design deferred to Wave 3 plan.
- Node-level checkpoint (ReAct single-step). Stage-level (U7) satisfies core need.
- DeerFlow-style disk filesystem. Redis is the persistence layer.
- Full LangGraph migration. Self-built architecture stays.
---
## Risks & Dependencies
- **Risk: Auxiliary model availability.** If `auxiliary_model` alias is not configured in `agentkit.yaml`, `LLMGateway` raises `ModelNotFoundError`. Mitigation: `ContextCompressor` catches this in the auxiliary try-block and falls through to main model (R15 fallback path). Documented in YAML comments.
- **Risk: Recovery layer recursive loop.** `ReflexionEngine.execute()` internally calls `ReActEngine.execute()` (5th call site). If Recovery itself fails, it must NOT trigger another Recovery — KTD5 wires only at `chat.py:613`, so ReflexionEngine's internal ReAct call bypasses the chain. Recursive loop is structurally impossible.
- **Risk: `git checkout` destructive scope.** Misconfigured `rollback_command` (e.g., `git checkout .`) could wipe unrelated changes. Mitigation: rollback is opt-in per-phase (KTD6); audit log via `phase_rollback_started`/`phase_rollback_failed` events; documented YAML examples use file-scoped commands (`git checkout <specific_files>`).
- **Risk: `TaskResult.error_struct` field addition breaks pickle/serialization.** `TaskResult` is a dataclass with `to_dict`; new optional field with `None` default is backward-compatible for `to_dict` consumers. Pickle deserialization of pre-change data still works (new field defaults to None).
- **Dependency: Wave 1 merged.** `ServerConfig.from_dict` extension template (prompt_cache/streaming/verification sections) is established and tested. Wave 2's 3 new sections (auxiliary_model, fallback_chain, rollback) reuse this exact pattern.
- **Dependency: U7 PipelineCheckpoint already shipped.** `orchestrator/checkpoint.py:56` exists with `save(plan_id, phase, plan_status)` API; U4 only adjusts the calling order, not the API.
---
## Sources / Research
- Brainstorm: `docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md` (Wave 2 section, R13-R21, KTD4/KTD5)
- Wave 1 plan: `docs/plans/2026-06-29-002-feat-agent-wave1-quick-wins-plan.md` (ServerConfig.from_dict extension template established)
- Finding 1 (contract preservation rule): `docs/solutions/logic-errors/long-horizon-reliability-code-review-fixes.md` — "新字段默认值须保持既有契约" drives KTD6's opt-in default for `rollback_command`
- Finding 2 (retry-storm defense): `docs/solutions/security-issues/portal-platform-security-reliability-fixes.md` — Emergency layer's terminal `error_code` pattern (don't propagate unhandled exceptions)
- Finding 3 (atomic claim pattern): `docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.md``SKIP LOCKED` not reusable; `PipelineCheckpoint` is in-memory dict + Redis
- Finding 4 (empty-response anti-pattern): `docs/solutions/ui-bugs/tauri-reload-loses-session.md` — auxiliary LLM must check truthy return value, not just absence of exception; drives U1's empty-content fallback
- Code locations verified during planning:
- `src/agentkit/core/compressor.py:35-44,123-158``_summarize` injection point for G4
- `src/agentkit/llm/gateway.py:170-227` — existing per-model fallback chain (semantics differ from G4's task-type routing)
- `src/agentkit/llm/config.py:198-257``LLMConfig` dataclass and `from_dict`
- `src/agentkit/core/reflexion.py:68-111``ReflexionEngine.__init__` and `execute()` signatures (already supports `evaluate_model`/`reflect_model`)
- `src/agentkit/core/fallback.py` (full file, 19 lines) — 3 existing constants; `EmergencyRules` adds alongside
- `src/agentkit/core/protocol.py:118-145``TaskResult` dataclass and `to_dict`
- `src/agentkit/experts/plan.py:147-233``PlanPhase` dataclass, `to_dict`/`from_dict`
- `src/agentkit/experts/orchestrator.py:246-270` — phase failure capture + checkpoint save site (U4 integration point)
- `src/agentkit/orchestrator/checkpoint.py:56``PipelineCheckpoint` (U7, no API change needed)
- `src/agentkit/core/verification_loop.py:67-103` — subprocess execution pattern for `RollbackExecutor`
- `src/agentkit/tools/shell.py:168-174,526-534``_DANGEROUS_BINARY_FLAGS` (git checkout not listed), `_is_dangerous` logic (KTD7 rationale)