fischer-agentkit/docs/plans/2026-06-29-003-feat-agent-w...

31 KiB
Raw Blame History

date type origin
2026-06-29 feat docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md

Wave 2 — Auxiliary LLM Routing, Three-Tier Fallback, Atomic Subtask Rollback

Summary

Wave 2 of the advanced-agent gap optimization brainstorm. Three medium-coupling gaps: G4 routes summary calls through a cheaper auxiliary LLM (falling back to main on failure), G7 introduces a three-tier fallback chain (main → Recovery via existing ReflexionEngine → Emergency with structured errors), G9 binds atomic subtask rollback to PlanPhase (opt-in via rollback_command, coordinated with the existing U7 PipelineCheckpoint).


Problem Frame

Wave 1 shipped self-contained quick wins (G1/G2/G3/G8). Wave 2 addresses the medium-coupling gaps that touch multiple layers and resolve two deferred-to-planning decisions surfaced in the brainstorm: G7 Emergency layer rule template shape, and G9 rollback_command default behavior. The constraints these gaps must respect come from docs/solutions/logic-errors/long-horizon-reliability-code-review-fixes.md — new fields must preserve existing contracts, dynamic plan mutations must persist immediately, and PipelineCheckpoint is in-memory dict + Redis fallback (not a DB row lock).


Requirements

Carried from docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md § Wave 2 (R13-R21):

  • R13. ContextCompressor summary task routes to auxiliary model (cheap, e.g. Gemini Flash / Doubao lite), not main model.
  • R14. auxiliary_model is configurable, separated from main model.
  • R15. Summary quality does not degrade: auxiliary failure falls back to main model (not to simple_summary).
  • R16. Main agent failure triggers Recovery layer (reuse ReflexionEngine Evaluate→Reflect→Retry).
  • R17. Recovery failure triggers Emergency layer (rule-based fallback, structured error + suggestion).
  • R18. Fallback chain is configurable (per-layer max retries, enable/disable Recovery/Emergency).
  • R19. PlanPhase (experts/plan.py:147) gains optional validation_command and rollback_command fields.
  • R20. Phase failure auto-executes rollback_command when configured (default git checkout pattern); coordinated with U7 checkpoint.
  • R21. checkpoint.save runs only after rollback validation passes (avoid persisting failed state).

Cross-cutting: R26 (configuration via agentkit.yaml), R27 (each gap ships a minimal self-check test, per ponytail rule).


Key Technical Decisions

KTD1. Auxiliary routing is compressor-scoped, not gateway-level. LLMGateway already has a per-model fallback chain (gateway.py:170-227), but it semantics are "main fails → try backup model", not "route by task type". G4 adds an auxiliary_model parameter to ContextCompressor.__init__; _summarize tries auxiliary first, falls back to main on failure. This avoids touching the gateway's existing fallback semantics and keeps the change to one file plus config wiring.

KTD2. Emergency layer adds a parallel error_struct field, not replacing error_message. TaskResult.error_message: str | None is serialized to API responses (protocol.py:132-145) — changing its type breaks frontend contracts. New error_struct: dict | None field carries {error_code, message, suggestions, retryable}. error_message remains the human-readable string (now derived from error_struct.message when present). Backward compatibility preserved.

KTD3. Emergency rule classification by exception type, not by status field. TaskTimeoutErrortimeout, LoopDetectedErrorloop_detected, LLMProviderErrorllm_failure, TaskCancelledErrorcancelled (not Emergency-eligible, propagates), generic Exceptioninternal_error. Each rule maps to a fixed message + suggestion list (no LLM-driven suggestions — pure rule-based, per brainstorm).

KTD4. Recovery layer reuses ReflexionEngine with main model, not auxiliary. ReflexionEngine.execute() already supports evaluate_model/reflect_model parameters (reflexion.py:94-111), defaulting to the main model. Recovery triggers on main agent failure; using the main model for evaluate/reflect maximizes diagnostic reliability (the same model that failed is best positioned to reflect on its own failure). Auxiliary model is reserved for G4's cost-sensitive compression path.

KTD5. Three-tier chain wires at chat.py:613 (WebSocket main path), not at ReActEngine.execute(). ReAct has 5 call sites (chat.py:613, rewoo.py:1204, cli/chat.py:338, reflexion.py:216, config_driven.py:695). Wrapping at ReAct would force all 5 sites through Recovery/Emergency, including ReflexionEngine itself (creating a recursive loop). Wrapping at chat.py:613 covers the primary user-facing path; CLI and ReWOO are out of scope (deferred to follow-up).

KTD6. Rollback is opt-in via rollback_command field; no implicit default. Brainstorm KTD5 mentioned "default git checkout" but auto-executing git checkout on every phase failure changes existing behavior (Finding 1: "新字段默认值须保持既有契约"). Resolution: rollback_command: str | None = None. When unset, no rollback executes (preserves existing contract). When set, the configured command runs after validation failure, before checkpoint save. git checkout <files> is the canonical pattern documented in YAML examples.

KTD7. Rollback executes via RollbackExecutor, not via ShellTool. ShellTool._is_dangerous would trigger confirm_callback for git checkout (not in _SAFE_COMMAND_PREFIXES, not in _DANGEROUS_BINARY_FLAGS). Orchestrator-internal execution bypasses ShellTool — uses asyncio.create_subprocess_shell with cwd/timeout/proc.kill() pattern from verification_loop.py:67-103. Audit logging added (not a whitelist concern; terminal_whitelist only governs /api/v1/terminal/server route per terminal_server.py:227).

KTD8. G9 checkpoint ordering: validation → rollback → checkpoint save. Currently orchestrator.py:265 saves checkpoint unconditionally after phase finalizes (success or failure). R21 requires checkpoint save only after rollback validation passes. New ordering: phase fails → mark FAILED → mark dependents FAILED → execute validation_command (if set) → if validation fails, execute rollback_command (if set) → save checkpoint (only if rollback validation passed or no rollback configured).


High-Level Technical Design

Three-tier fallback chain (G7)

stateDiagram-v2
    [*] --> Main: chat request
    Main --> MainSuccess: success
    Main --> Recovery: failure (exception or empty_fallback)
    Recovery --> RecoverySuccess: ReflexionEngine retry succeeds
    Recovery --> Emergency: max_reflections exhausted
    Emergency --> [*]: emit error_struct, terminal
    MainSuccess --> [*]: normal response
    RecoverySuccess --> [*]: recovered response

G9 phase failure + rollback sequence

sequenceDiagram
    participant O as TeamOrchestrator
    participant P as PlanPhase
    participant RE as RollbackExecutor
    participant CP as PipelineCheckpoint

    O->>P: execute phase
    P-->>O: failure (exception)
    O->>O: mark FAILED + dependents FAILED
    alt validation_command set
        O->>RE: run validation_command
        RE-->>O: ValidationResult
        alt validation fails AND rollback_command set
            O->>RE: run rollback_command
            RE-->>O: RollbackResult
            alt rollback validation passes
                O->>CP: save checkpoint
            else rollback validation fails
                O->>O: log rollback_failed, skip checkpoint
            end
        else no rollback_command
            O->>CP: save checkpoint (no rollback)
        end
    else no validation_command
        O->>CP: save checkpoint (no validation)
    end

Configuration shape (YAML)

# G4: Auxiliary LLM for compression
llm:
  auxiliary_model: fast    # alias resolved via existing model_aliases mechanism

# G7: Three-tier fallback chain
fallback_chain:
  enabled: true
  recovery:
    enabled: true
    max_retries: 1          # ReflexionEngine max_reflections override
  emergency:
    enabled: true

# G9: Rollback configuration (per-phase opt-in via PlanPhase.rollback_command)
rollback:
  default_timeout: 30.0    # RollbackExecutor subprocess timeout

Implementation Units

U1. G4 — Auxiliary LLM Routing in ContextCompressor

Goal: Route _summarize LLM calls through auxiliary_model when configured; fall back to main model on auxiliary failure (not to _simple_summary).

Requirements: R13, R14, R15, R26, R27

Dependencies: none (self-contained)

Files:

  • Modify: src/agentkit/core/compressor.py (add auxiliary_model param + routing in _summarize)
  • Modify: src/agentkit/llm/config.py (add auxiliary_model: str | None field to LLMConfig)
  • Modify: src/agentkit/server/config.py (_build_llm_config reads auxiliary_model from llm section)
  • Modify: agentkit.yaml (document llm.auxiliary_model: fast in llm section)
  • Create: tests/unit/test_compressor_auxiliary.py

Approach: ContextCompressor.__init__ gains auxiliary_model: str | None = None param (after existing model param). _summarize (compressor.py:123-158) restructuring:

  1. If auxiliary_model is set and differs from model, try auxiliary_model first via self._llm_gateway.chat(model=self._auxiliary_model, ...).
  2. Truthiness check (Finding 4 anti-pattern): treat empty response.content (None or whitespace-only) as failure, not success. On failure, log and fall through to main model.
  3. If auxiliary succeeds with non-empty content, return it.
  4. If auxiliary fails (exception OR empty content), retry with main self._model.
  5. Existing except Exception → _simple_summary block remains as the final degradation tier.

LLMConfig.auxiliary_model reads from data.get("auxiliary_model") in _build_llm_config. Agentkit.yaml already declares fast: bailian-coding/qwen-turbo alias — this is the canonical auxiliary target.

Execution note: characterization-first. Capture current _summarize behavior (single model, exception → simple_summary) with auxiliary_model=None tests, then add auxiliary_model="fast" tests for new routing behavior.

Patterns to follow:

  • LLMGateway._get_models_to_try (gateway.py:170-227) — existing fallback chain pattern
  • Wave 1's ServerConfig.from_dict extension template (prompt_cache/streaming/verification sections, config.py:239-273)

Test scenarios:

  • Happy path: auxiliary_model="fast" set, auxiliary returns non-empty content → result is auxiliary content, main model not called
  • Empty content fallback: auxiliary returns content="" (or None) → main model called, main content returned (covers Finding 4 anti-pattern)
  • Auxiliary exception: auxiliary raises LLMProviderError → main model called, main content returned
  • Both fail: auxiliary raises, main raises → _simple_summary returned (existing degradation preserved)
  • Characterization: auxiliary_model=None → behavior matches current code (single model call)
  • Config wiring: LLMConfig.from_dict reads auxiliary_model field from dict; ServerConfig._build_llm_config passes it through
  • Audit: auxiliary call uses agent_name="compressor", task_type="summarization" (preserved for usage tracking)

Verification: All test scenarios pass; existing tests/unit/test_compressor*.py (if any) still pass with auxiliary_model=None default; ruff clean.


U2. G7 — Emergency Layer Rule Template + TaskResult Extension

Goal: Add rule-based Emergency classifier with structured error output. No wiring yet — just the infrastructure (classifier + data structure + fallback.py extension).

Requirements: R17, R18, R26, R27

Dependencies: none (foundation for U3)

Files:

  • Modify: src/agentkit/core/fallback.py (add EmergencyRules class + EmergencyError dataclass, preserve existing 3 constants)
  • Modify: src/agentkit/core/protocol.py (add error_struct: dict | None = None field to TaskResult, update to_dict)
  • Create: tests/unit/test_emergency_rules.py

Approach:

EmergencyError dataclass (in fallback.py):

@dataclass
class EmergencyError:
    error_code: str          # "timeout"|"loop_detected"|"llm_failure"|"internal_error"
    message: str             # human-readable Chinese message (mirrors EMPTY_LLM_RESPONSE style)
    suggestions: list[str]   # actionable user-facing suggestions
    retryable: bool          # whether user retry might succeed
    original_error: str      # str(exc) for traceability

    def to_dict(self) -> dict: ...
    def to_error_message(self) -> str: ...   # formatted "message\n建议1) ... 2) ..."

EmergencyRules class (rule-based classifier, no LLM):

class EmergencyRules:
    @staticmethod
    def classify(exc: Exception, config: dict | None = None) -> EmergencyError:
        # Match by exception type (TaskTimeoutError, LoopDetectedError, LLMProviderError, etc.)
        # config allows per-rule customization (suggestion overrides, retryable overrides)

Rule mapping (initial set, expandable via config):

  • TaskTimeoutErrorerror_code="timeout", retryable=True, suggestions: ["稍后重试", "简化任务范围"]
  • LoopDetectedErrorerror_code="loop_detected", retryable=True, suggestions: ["拆分任务", "检查工具参数"]
  • LLMProviderErrorerror_code="llm_failure", retryable=True, suggestions: ["稍后重试", "切换模型"]
  • TaskCancelledError → not classified (propagates as-is, Emergency not triggered)
  • Generic Exceptionerror_code="internal_error", retryable=False, suggestions: ["联系管理员"]

TaskResult extension:

@dataclass
class TaskResult:
    # ... existing fields ...
    error_message: str | None    # unchanged
    error_struct: dict | None = None   # NEW: serialized EmergencyError.to_dict()

to_dict includes error_struct when set; error_message continues to hold the human-readable string.

Patterns to follow:

  • Existing EMPTY_LLM_RESPONSE / MAX_STEPS_REACHED style in fallback.py (Chinese, "建议:..." format)
  • VerificationResult.errors: list[str] field pattern from verification_loop.py:18-24
  • TaskResult.to_dict pattern at protocol.py:132-145

Test scenarios:

  • Happy path: classify(TaskTimeoutError(...)) returns EmergencyError(error_code="timeout", retryable=True)
  • Each exception type maps to correct error_code
  • TaskCancelledError is NOT classified (caller must check before invoking classify)
  • to_dict() produces all 5 fields
  • to_error_message() formats suggestions as "建议1) ... 2) ..."
  • EmergencyRules.classify(Exception("unknown"))error_code="internal_error", retryable=False
  • Config override: custom suggestion list for timeout rule via config dict
  • TaskResult with error_struct set: to_dict() includes both error_message and error_struct
  • TaskResult with error_struct=None (default): to_dict() matches current behavior (backward compat)

Verification: All scenarios pass; fallback.py existing 3 constants unchanged; TaskResult.to_dict for tasks without error_struct matches pre-change output byte-for-byte.


U3. G7 — Three-Tier Fallback Chain Wiring

Goal: Wire main → Recovery (ReflexionEngine) → Emergency (EmergencyRules) at chat.py:613. Composes U2's infrastructure with existing ReflexionEngine.

Requirements: R16, R18, R26

Dependencies: U2 (EmergencyRules + TaskResult.error_struct)

Files:

  • Modify: src/agentkit/server/routes/chat.py (wrap main agent call at L613 with three-tier chain)
  • Modify: src/agentkit/server/config.py (add fallback_chain section to from_dict)
  • Modify: agentkit.yaml (document fallback_chain: section)
  • Create: tests/unit/test_fallback_chain.py

Approach:

New helper module or inline function in chat.py:

async def execute_with_fallback_chain(
    agent: ConfigDrivenAgent,
    task: Task,
    config: dict,    # fallback_chain config section
    llm_gateway: LLMGateway,
) -> TaskResult:
    # Tier 1: Main
    try:
        result = await agent.execute(task)
        if result.status == "success" or result.status == "completed":
            return result
        # Treat non-success as soft failure → trigger Recovery
        raise AgentExecutionError(result.error_message or "main agent did not succeed")
    except (TaskTimeoutError, LoopDetectedError, LLMProviderError, AgentExecutionError) as exc:
        if not config.get("recovery", {}).get("enabled", True):
            return _to_emergency(exc, task)

        # Tier 2: Recovery (ReflexionEngine)
        try:
            reflexion_engine = ReflexionEngine(
                llm_gateway=llm_gateway,
                max_reflections=config.get("recovery", {}).get("max_retries", 1),
                # ... other params from agent's existing reflexion config
            )
            recovery_result = await reflexion_engine.execute(
                messages=task.messages,    # rebuild from task
                tools=agent.get_tools(),
                model=task.model,
                agent_name=agent.name,
                task_id=task.task_id,
            )
            if recovery_result.status == "success":
                return _recovery_to_task_result(recovery_result, task)
        except Exception as recovery_exc:
            logger.warning(f"Recovery layer failed: {recovery_exc}")

        # Tier 3: Emergency
        return _to_emergency(exc, task)

_to_emergency(exc, task) constructs EmergencyError via EmergencyRules.classify(exc, config), then returns TaskResult(status=FAILED, error_message=emergency.to_error_message(), error_struct=emergency.to_dict(), ...).

Wiring at chat.py:613: replace direct agent.execute(task) call with execute_with_fallback_chain(...). Recovery config from server_config.fallback_chain (new ServerConfig.fallback_chain: dict field, mirroring Wave 1 pattern).

Recovery layer scope (KTD5): Only chat.py:613 is wrapped. CLI (cli/chat.py:338), ReWOO (rewoo.py:1204), ReflexionEngine's internal ReAct call, and config_driven.py:695 are NOT wrapped (would create recursive loop or unwanted coupling). These remain on the direct-execute path. Documented in ## Scope Boundaries.

Patterns to follow:

  • config_driven.py:836 — existing ReflexionEngine instantiation pattern (constructor params, execute call)
  • Wave 1's verification config section in ServerConfig.from_dict (config.py:240-273)

Test scenarios:

  • Happy path: main agent succeeds → no Recovery, no Emergency triggered; error_struct=None
  • Main fails (timeout) → Recovery triggered → Recovery succeeds → error_struct=None, output from ReflexionEngine
  • Main fails (timeout) → Recovery triggered → Recovery fails (max_reflections exhausted) → Emergency triggered → error_struct populated with error_code="timeout", retryable=True
  • Main fails (LoopDetectedError) → Emergency error_code="loop_detected"
  • Main fails (LLMProviderError) → Emergency error_code="llm_failure"
  • Main fails (TaskCancelledError) → propagates as-is (NOT routed to Emergency)
  • Main fails (generic Exception) → Emergency error_code="internal_error", retryable=False
  • Config: fallback_chain.recovery.enabled=false → skip Recovery, go directly to Emergency
  • Config: fallback_chain.emergency.enabled=false → re-raise original exception (no Emergency)
  • Integration: full chain on real ConfigDrivenAgent with mocked LLM (mock main raises, mock ReflexionEngine succeeds)

Verification: All scenarios pass; existing chat WebSocket tests still pass; ruff clean.


U4. G9 — PlanPhase Rollback Fields + RollbackExecutor + TeamOrchestrator Integration

Goal: Add validation_command/rollback_command optional fields to PlanPhase; execute rollback on phase failure (opt-in); coordinate with U7 checkpoint ordering per R21.

Requirements: R19, R20, R21, R26, R27

Dependencies: none (uses existing U7 PipelineCheckpoint and VerificationLoop patterns)

Files:

  • Modify: src/agentkit/experts/plan.py (PlanPhase dataclass + to_dict/from_dict symmetry)
  • Modify: src/agentkit/experts/orchestrator.py (insert rollback execution between phase failure and checkpoint save)
  • Create: src/agentkit/orchestrator/rollback.py (RollbackExecutor class)
  • Modify: src/agentkit/server/config.py (add rollback section to from_dict)
  • Modify: agentkit.yaml (document rollback: section)
  • Create: tests/unit/test_phase_rollback.py

Approach:

PlanPhase extension (plan.py:147-233): Add two optional fields with None default (preserves existing contract per Finding 1):

validation_command: str | None = None
rollback_command: str | None = None

Update to_dict() to include both fields (only when not None, to keep dict shape minimal). Update from_dict() to read both fields.

RollbackExecutor class (new file orchestrator/rollback.py): Mirrors VerificationLoop pattern (verification_loop.py:67-103):

class RollbackExecutor:
    def __init__(self, working_dir: str | None = None, timeout: float = 30.0): ...

    async def execute(self, command: str) -> RollbackResult:
        # asyncio.create_subprocess_shell with cwd/timeout/proc.kill()
        # Returns RollbackResult(passed, exit_code, stdout, stderr, command)

    async def validate(self, command: str) -> RollbackResult:
        # Same as execute but returns passed=False on non-zero exit

RollbackResult dataclass: {passed: bool, exit_code: int, stdout: str, stderr: str, command: str}.

TeamOrchestrator integration (orchestrator.py:246-270): New ordering per KTD8:

# Existing: phase failure detected, mark FAILED + dependents FAILED
# (lines 247-261 unchanged)

# NEW: rollback phase (only if validation_command and rollback_command configured)
should_save_checkpoint = True
if ph.validation_command and ph.rollback_command:
    validator = RollbackExecutor(working_dir=self._workspace_root, timeout=...)
    validation_result = await validator.validate(ph.validation_command)
    if not validation_result.passed:
        rollback_result = await validator.execute(ph.rollback_command)
        if not rollback_result.passed:
            # Rollback failed → don't save checkpoint (R21)
            should_save_checkpoint = False
            logger.error(f"Rollback failed for phase {ph.id}: {rollback_result.stderr}")
            # Emit phase_rollback_failed event
            await self._broadcast_event("phase_rollback_failed", {...})

# Existing: checkpoint save (conditional now)
if should_save_checkpoint and self._checkpoint is not None:
    try:
        await self._checkpoint.save(plan.id, ph, plan.status.value)
    except Exception as e:
        logger.warning(...)

Events emitted (new): phase_rollback_started, phase_rollback_completed, phase_rollback_failed. Existing phase_failed event unchanged (emitted before rollback).

Config: rollback.default_timeout: float = 30.0 in ServerConfig.rollback dict (Wave 1 pattern).

Security boundary (KTD7): Rollback subprocess does NOT go through ShellTool (avoids confirm_callback for git checkout). Audit log emitted via _broadcast_event. terminal_whitelist is not consulted (only governs /api/v1/terminal/server route, per terminal_server.py:227).

Execution note: characterization-first. Test current behavior (rollback_command=None → no rollback, checkpoint saved) before adding rollback behavior.

Patterns to follow:

  • VerificationLoop subprocess execution pattern (verification_loop.py:67-103) — asyncio.create_subprocess_shell + cwd + timeout + proc.kill()
  • PlanPhase.to_dict/from_dict symmetric serialization pattern at plan.py:186-233
  • TeamOrchestrator._execute_phase event broadcasting pattern (orchestrator.py:253-260)
  • Wave 1's ServerConfig.from_dict extension template

Test scenarios:

  • Characterization: PlanPhase() with no validation_command/rollback_commandto_dict() output matches pre-change shape (no new keys); from_dict({}) produces phase with both fields as None
  • Serialization: phase with rollback_command="git checkout foo.py"to_dict() includes key; from_dict(to_dict()) round-trips
  • RollbackExecutor happy path: execute("git status") returns passed=True, exit_code=0
  • RollbackExecutor timeout: execute("sleep 10", timeout=0.1) returns passed=False, exit_code=-1 (or similar)
  • RollbackExecutor failure: execute("false") returns passed=False, exit_code=1
  • Integration — opt-in default: phase fails, rollback_command=None → no RollbackExecutor call, checkpoint saved (existing behavior)
  • Integration — rollback configured, validation passes (returns 0): rollback NOT executed, checkpoint saved
  • Integration — rollback configured, validation fails, rollback succeeds (returns 0): rollback executed, checkpoint saved
  • Integration — rollback configured, validation fails, rollback fails (returns 1): rollback executed, checkpoint NOT saved (R21), phase_rollback_failed event emitted
  • Integration — real git repo fixture: phase writes file via git checkout, rollback git checkout foo.py restores file; assert file content matches pre-phase state

Verification: All scenarios pass; existing tests/unit/test_pipeline_state.py and tests/unit/test_team_orchestrator*.py (if any) still pass; ruff clean.


Scope Boundaries

Deferred to Follow-Up Work

  • Recovery wiring at non-chat call sites (cli/chat.py:338, rewoo.py:1204, config_driven.py:695). KTD5 limits Wave 2 to the primary chat WebSocket path. Other entry points can adopt the same wrapper in a follow-up.
  • Recovery layer streaming events. Wave 2's chain returns final TaskResult; SSE events for "recovery_started"/"recovery_completed" would require deeper WebSocket protocol changes.
  • Patch-level rollback (git apply -R <patch>). KTD6 + KTD7 scope Wave 2 to git checkout <files> pattern. Patch-level requires extending CheckpointData schema to track patches — Wave 3 candidate.
  • DB-backed atomic claim for checkpoint. Finding 3 noted PipelineCheckpoint is in-memory dict + Redis fallback, not a PostgreSQL FOR UPDATE SKIP LOCKED pattern. Multi-process atomicity is out of scope; single-process asyncio.Lock (already present at orchestrator.py:53) is sufficient.

Outside this product's identity (carried from brainstorm)

  • Wave 3 (G5/G6) — tree-sitter integration, SOLO four-stage state machine. Strategic direction locked in brainstorm KTD6/KTD7; implementation design deferred to Wave 3 plan.
  • Node-level checkpoint (ReAct single-step). Stage-level (U7) satisfies core need.
  • DeerFlow-style disk filesystem. Redis is the persistence layer.
  • Full LangGraph migration. Self-built architecture stays.

Risks & Dependencies

  • Risk: Auxiliary model availability. If auxiliary_model alias is not configured in agentkit.yaml, LLMGateway raises ModelNotFoundError. Mitigation: ContextCompressor catches this in the auxiliary try-block and falls through to main model (R15 fallback path). Documented in YAML comments.
  • Risk: Recovery layer recursive loop. ReflexionEngine.execute() internally calls ReActEngine.execute() (5th call site). If Recovery itself fails, it must NOT trigger another Recovery — KTD5 wires only at chat.py:613, so ReflexionEngine's internal ReAct call bypasses the chain. Recursive loop is structurally impossible.
  • Risk: git checkout destructive scope. Misconfigured rollback_command (e.g., git checkout .) could wipe unrelated changes. Mitigation: rollback is opt-in per-phase (KTD6); audit log via phase_rollback_started/phase_rollback_failed events; documented YAML examples use file-scoped commands (git checkout <specific_files>).
  • Risk: TaskResult.error_struct field addition breaks pickle/serialization. TaskResult is a dataclass with to_dict; new optional field with None default is backward-compatible for to_dict consumers. Pickle deserialization of pre-change data still works (new field defaults to None).
  • Dependency: Wave 1 merged. ServerConfig.from_dict extension template (prompt_cache/streaming/verification sections) is established and tested. Wave 2's 3 new sections (auxiliary_model, fallback_chain, rollback) reuse this exact pattern.
  • Dependency: U7 PipelineCheckpoint already shipped. orchestrator/checkpoint.py:56 exists with save(plan_id, phase, plan_status) API; U4 only adjusts the calling order, not the API.

Sources / Research

  • Brainstorm: docs/brainstorms/2026-06-29-advanced-agent-gap-optimization-requirements.md (Wave 2 section, R13-R21, KTD4/KTD5)
  • Wave 1 plan: docs/plans/2026-06-29-002-feat-agent-wave1-quick-wins-plan.md (ServerConfig.from_dict extension template established)
  • Finding 1 (contract preservation rule): docs/solutions/logic-errors/long-horizon-reliability-code-review-fixes.md — "新字段默认值须保持既有契约" drives KTD6's opt-in default for rollback_command
  • Finding 2 (retry-storm defense): docs/solutions/security-issues/portal-platform-security-reliability-fixes.md — Emergency layer's terminal error_code pattern (don't propagate unhandled exceptions)
  • Finding 3 (atomic claim pattern): docs/solutions/architecture-patterns/bitable-companion-service-security-reliability-patterns.mdSKIP LOCKED not reusable; PipelineCheckpoint is in-memory dict + Redis
  • Finding 4 (empty-response anti-pattern): docs/solutions/ui-bugs/tauri-reload-loses-session.md — auxiliary LLM must check truthy return value, not just absence of exception; drives U1's empty-content fallback
  • Code locations verified during planning:
    • src/agentkit/core/compressor.py:35-44,123-158_summarize injection point for G4
    • src/agentkit/llm/gateway.py:170-227 — existing per-model fallback chain (semantics differ from G4's task-type routing)
    • src/agentkit/llm/config.py:198-257LLMConfig dataclass and from_dict
    • src/agentkit/core/reflexion.py:68-111ReflexionEngine.__init__ and execute() signatures (already supports evaluate_model/reflect_model)
    • src/agentkit/core/fallback.py (full file, 19 lines) — 3 existing constants; EmergencyRules adds alongside
    • src/agentkit/core/protocol.py:118-145TaskResult dataclass and to_dict
    • src/agentkit/experts/plan.py:147-233PlanPhase dataclass, to_dict/from_dict
    • src/agentkit/experts/orchestrator.py:246-270 — phase failure capture + checkpoint save site (U4 integration point)
    • src/agentkit/orchestrator/checkpoint.py:56PipelineCheckpoint (U7, no API change needed)
    • src/agentkit/core/verification_loop.py:67-103 — subprocess execution pattern for RollbackExecutor
    • src/agentkit/tools/shell.py:168-174,526-534_DANGEROUS_BINARY_FLAGS (git checkout not listed), _is_dangerous logic (KTD7 rationale)