---
title: "feat: AgentKit Phase 3 — 持久化·记忆·进化·技能·可观测性升级"
status: completed
created: 2026-06-06
plan_type: feat
depth: deep
origin: Hermes Agent 对比分析 + 5 大问题评估
branch: feat/agentkit-phase3-upgrade
---

# AgentKit Phase 3 升级计划

## Summary

基于 Hermes Agent 对标分析和 AgentKit 现状评估，本计划解决 5 个核心问题：无法持久运行、记忆系统未接入、进化架构断层、技能能力不足、缺乏可观测性。覆盖 P0+P1+P2 共 10 项升级，分 3 个交付阶段实施，保持主干代码不变，在 `feat/agentkit-phase3-upgrade` 分支开发。

## Problem Frame

AgentKit 当前是一个"有框架但未接入"的状态：

- **持久化断层**：docker-compose 配置了 Redis + PostgreSQL，但 TaskStore 纯内存，进程重启丢失所有状态
- **记忆断层**：三层记忆架构设计完整，但 Agent 循环中零记忆调用，ReActEngine 不读写记忆
- **进化断层**：EvolutionConfig 定义了配置但 EvolutionMixin 不读取，Reflector 基于硬编码规则，A/B 测试数据伪造
- **技能断层**：Skill 是纯数据容器，无自动创建/编排/策展能力，不支持 SKILL.md 开放标准
- **可观测性断层**：无结构化日志、无 metrics、无执行轨迹导出

Hermes Agent 的核心创新是"执行轨迹 → LLM 反思 → 技能沉淀 → 复用加速"的闭环飞轮。AgentKit 需要建立类似但适配企业场景的进化能力。

## Requirements

| ID | 需求 | 优先级 | 来源 |
|----|------|--------|------|
| R1 | TaskStore 持久化到 Redis/PG，进程重启不丢状态 | P0 | 持久运行评估 |
| R2 | 记忆系统接入 Agent 循环，执行前检索上下文，执行后写入轨迹 | P0 | 记忆架构评估 |
| R3 | LLM 驱动反思器替换硬编码 Reflector | P0 | 进化架构评估 |
| R4 | EpisodicMemory 实现 pgvector 向量检索 | P1 | 记忆架构评估 |
| R5 | 执行轨迹记录器，为反思和可观测性提供数据 | P1 | 进化+可观测性 |
| R6 | 技能编排/Pipeline 能力 | P1 | 技能完备性评估 |
| R7 | EvolutionStore 持久化 | P1 | 进化架构评估 |
| R8 | SKILL.md 格式 + 渐进式分层 | P2 | 技能完备性评估 |
| R9 | 上下文压缩与 Prompt 缓存 | P2 | Token 成本优化 |
| R10 | 可观测性（结构化日志 + metrics + 健康检查增强） | P2 | 生产运维 |

## Scope Boundaries

### In Scope

- 10 项升级（R1-R10），分 3 个交付阶段
- 保持现有 API 向后兼容
- 分支开发模式，不修改主干

### Out of Scope

- 多平台消息网关（Telegram/Discord/Slack 等）——定位差异，AgentKit 是 AI 引擎而非个人 Agent
- 子代理并行执行——需要更复杂的调度架构，留待 Phase 4
- 技能自动创建 + Curator——依赖 LLM 反思器和执行轨迹，留待 Phase 4
- agentskills.io 技能市场——需要社区基础设施，留待 Phase 4
- SemanticMemory 的 RAG/知识图谱后端实现——依赖外部服务，当前保持适配器模式

### Deferred to Follow-Up Work

- RateLimiter 迁移到 Redis 分布式限流
- 多 worker 模式下的状态共享
- 优雅关闭（SIGTERM 信号处理）
- 用户建模（user_id + 偏好跟踪）

---

## Key Technical Decisions

### KTD1: TaskStore 持久化策略 — Redis 优先

**决策**：TaskStore 默认使用 Redis 后端，InMemoryTaskStore 仅用于开发/测试。

**理由**：
- docker-compose 已配置 Redis，基础设施就绪
- TaskStore 已有 `RedisTaskStore` 实现（`server/task_store.py`），只需设为默认
- Redis 天然支持 TTL，与任务过期清理需求一致
- 避免引入新的存储依赖

**替代方案**：PostgreSQL 后端——更持久但延迟更高，适合归档而非活跃任务状态。

### KTD2: 记忆集成方式 — MemoryRetriever 注入 ReActEngine

**决策**：在 ReActEngine.execute() 中注入 `MemoryRetriever | None` 参数，执行前检索相关上下文注入 system_prompt，执行后写入轨迹到 EpisodicMemory。

**理由**：
- ReActEngine 是所有执行模式的底层引擎，在此层集成覆盖面最广
- MemoryRetriever 已实现三层并行检索 + 权重融合，无需重写
- 注入方式而非继承方式，保持 ReActEngine 的独立性

**替代方案**：在 ConfigDrivenAgent 层集成——更简单但只覆盖 ConfigDrivenAgent，不覆盖直接使用 ReActEngine 的场景。

### KTD3: 反思器策略 — LLM-in-the-loop + 规则降级

**决策**：新增 `LLMReflector`，通过 LLM 分析执行轨迹生成反思。保留 `RuleBasedReflector`（当前实现）作为降级方案，LLM 不可用时自动切换。

**理由**：
- GEPA 的核心洞见是"自然语言反思比数值奖励更有效"，这需要 LLM 级别的反思
- 企业场景需要降级策略，LLM 不可用时不能完全失去反思能力
- 不直接使用 DSPy/GEPA 框架——AgentKit 已有 LLMGateway，无需引入新依赖

**替代方案**：集成 DSPy + GEPA——更强大但引入重依赖，且 AgentKit 的定位不需要 GEPA 的完整进化流水线。

### KTD4: 执行轨迹存储 — SQLite 本地 + 可选 PG

**决策**：执行轨迹默认存储在本地 SQLite（`~/.agentkit/traces/`），可选配置 PostgreSQL 后端用于大规模部署。

**理由**：
- 与 Hermes Agent 一致（SQLite FTS5），轻量级
- 单机部署无需 PG，降低使用门槛
- PG 后端用于多实例部署场景

### KTD5: 技能编排 — 复用现有 PipelineEngine

**决策**：技能编排复用 `orchestrator/pipeline_engine.py` 的 PipelineEngine，新增 `SkillPipeline` 适配层将 Skill 包装为 Pipeline Step。

**理由**：
- PipelineEngine 已实现顺序/并行/条件执行，功能完整
- 避免重复造轮子，只需一个适配层
- Pipeline YAML 格式已定义，用户可声明式编排技能

### KTD6: SKILL.md 格式 — YAML 元数据 + Markdown 正文

**决策**：SKILL.md 采用 YAML frontmatter + Markdown 正文的混合格式，兼容 agentskills.io 标准。

**理由**：
- YAML frontmatter 机器可读（解析元数据），Markdown 正文人机可读（描述技能步骤）
- 与现有 YAML 配置格式兼容，迁移成本低
- agentskills.io 标准使用纯 Markdown，YAML frontmatter 是其超集

---

## High-Level Technical Design

### 进化飞轮架构

```mermaid
graph LR
    A[任务执行] --> B[执行轨迹记录]
    B --> C[LLM 反思分析]
    C --> D{质量达标?}
    D -->|否| E[Prompt 优化]
    D -->|是| F[技能沉淀]
    E --> G[A/B 测试]
    G --> H{统计显著?}
    H -->|是| I[应用/回滚]
    H -->|否| J[继续收集样本]
    F --> K[技能库]
    K -->|复用| A
    I --> K
```

### 记忆集成数据流

```mermaid
sequenceDiagram
    participant Client
    participant Agent as ConfigDrivenAgent
    participant Engine as ReActEngine
    participant Retriever as MemoryRetriever
    participant Episodic as EpisodicMemory

    Client->>Agent: handle_task(task)
    Agent->>Retriever: get_context(task.input_data)
    Retriever->>Episodic: search(similar tasks)
    Episodic-->>Retriever: relevant memories
    Retriever-->>Agent: context string
    Agent->>Engine: execute(messages + context)
    Engine-->>Agent: result + trace
    Agent->>Episodic: store(trace summary)
    Agent-->>Client: TaskResult
```

### 三阶段交付依赖

```mermaid
graph TD
    subgraph Phase A - 基础设施
        U1[U1: TaskStore 持久化]
        U2[U2: 执行轨迹记录器]
        U3[U3: EvolutionStore 持久化]
    end
    subgraph Phase B - 核心能力
        U4[U4: 记忆接入 Agent 循环]
        U5[U5: Episodic 向量检索]
        U6[U6: LLM 反思器]
        U7[U7: 技能编排]
    end
    subgraph Phase C - 增强
        U8[U8: SKILL.md 格式]
        U9[U9: 上下文压缩与缓存]
        U10[U10: 可观测性]
    end
    U1 --> U4
    U2 --> U4
    U2 --> U6
    U3 --> U6
    U4 --> U5
    U6 --> U8
```

---

## Implementation Units

### U1. TaskStore 持久化到 Redis

**Goal**: 将 TaskStore 默认后端从内存切换到 Redis，确保进程重启后任务状态不丢失。

**Requirements**: R1

**Dependencies**: 无

**Files**:
- Modify: `src/agentkit/server/task_store.py` — 将 `create_task_store()` 默认使用 Redis 后端
- Modify: `src/agentkit/server/app.py` — `create_app()` 中根据配置选择 TaskStore 后端
- Modify: `src/agentkit/server/config.py` — 新增 `task_store_backend` 配置项
- Modify: `src/agentkit/cli/main.py` — serve 命令传递 task_store 配置
- Test: `tests/unit/test_task_store_redis.py`

**Approach**:
1. `RedisTaskStore` 已存在于 `task_store.py`，验证其功能完整性
2. `create_task_store()` 工厂函数增加 `backend` 参数，默认 `redis`
3. `ServerConfig` 新增 `task_store` 配置块（backend/redis_url/ttl/max_records）
4. `create_app()` 从 `ServerConfig` 读取配置，创建对应 TaskStore
5. InMemoryTaskStore 保留用于测试，通过 `backend: memory` 显式启用

**Patterns to follow**: `src/agentkit/server/task_store.py` 中 `RedisTaskStore` 的现有实现

**Test scenarios**:
- Happy path: 创建任务 → 重启模拟（关闭 Redis 连接再重连）→ 查询任务仍存在
- Edge case: Redis 不可用时降级到 InMemoryTaskStore 并打 warning 日志
- Edge case: TTL 过期后任务自动清理
- Error path: Redis 连接失败时的错误处理和降级
- Integration: serve 命令启动后提交任务，查询任务状态

**Verification**: `PYTHONPATH=src pytest tests/unit/test_task_store_redis.py -v` 全部通过

---

### U2. 执行轨迹记录器

**Goal**: 在 ReActEngine 执行过程中记录完整的执行轨迹（每步动作、输入输出、耗时、Token 用量），为反思和可观测性提供数据。

**Requirements**: R5

**Dependencies**: 无

**Files**:
- Create: `src/agentkit/core/trace.py` — TraceStep + ExecutionTrace 数据类 + TraceRecorder
- Modify: `src/agentkit/core/react.py` — execute() 中注入 TraceRecorder，记录每步
- Modify: `src/agentkit/core/protocol.py` — TaskResult 新增 `trace` 字段
- Test: `tests/unit/test_trace_recorder.py`

**Approach**:
1. 定义 `TraceStep`（step/action/tool_name/input/output/duration_ms/tokens_used/error）和 `ExecutionTrace`（task_id/agent_name/skill_name/steps/total_duration/total_tokens/outcome/quality_score）
2. `TraceRecorder` 类：`start_trace()`、`record_step()`、`end_trace()`、`get_trace()`
3. `ReActEngine.execute()` 新增 `trace_recorder: TraceRecorder | None = None` 参数
4. 每次工具调用和 LLM 调用后调用 `record_step()`
5. `TaskResult` 新增可选 `trace: ExecutionTrace | None` 字段
6. 轨迹默认存储在内存中（单次请求生命周期），后续 U3 持久化

**Patterns to follow**: `src/agentkit/core/react.py` 中 `ReActStep` 和 `ReActResult` 的现有数据结构

**Test scenarios**:
- Happy path: 执行 3 步 ReAct 循环，验证轨迹包含 3 个 TraceStep
- Happy path: 工具调用记录 tool_name/input/output/duration
- Edge case: 无工具调用的纯 LLM 响应，轨迹只有 1 步
- Error path: 工具调用失败，TraceStep.error 非空
- Integration: ConfigDrivenAgent 通过 ReActEngine 执行任务，TaskResult 包含 trace

**Verification**: `PYTHONPATH=src pytest tests/unit/test_trace_recorder.py -v` 全部通过

---

### U3. EvolutionStore 持久化

**Goal**: 将进化事件从内存迁移到 SQLite 持久化存储，支持进化历史查询和回滚。

**Requirements**: R7

**Dependencies**: 无

**Files**:
- Modify: `src/agentkit/evolution/evolution_store.py` — 新增 SQLite 后端，替换内存存储
- Create: `src/agentkit/evolution/models.py` — SQLAlchemy ORM 模型（EvolutionEvent/SkillVersion/ABTestResult）
- Test: `tests/unit/test_evolution_store_persistent.py`

**Approach**:
1. 定义 SQLAlchemy ORM 模型：`EvolutionEvent`（id/agent_name/event_type/trace_id/reflection_id/proposal_id/status/created_at）、`SkillVersion`（id/skill_name/version/content/parent_version/created_at）、`ABTestResult`（id/test_id/variant/score/sample_count/created_at）
2. `EvolutionStore` 新增 `backend` 参数，默认 `sqlite`（路径 `~/.agentkit/evolution.db`）
3. `record()`/`query()`/`rollback()` 方法操作 SQLite
4. 保留内存后端用于测试
5. 首次运行自动创建表结构

**Patterns to follow**: `src/agentkit/evolution/evolution_store.py` 的现有接口

**Test scenarios**:
- Happy path: 记录进化事件 → 关闭连接 → 重新打开 → 查询到事件
- Happy path: 记录技能版本 → 查询版本历史
- Edge case: 空数据库首次查询返回空列表
- Error path: SQLite 文件不可写时的错误处理
- Integration: EvolutionMixin.evolve_after_task() 写入 EvolutionStore

**Verification**: `PYTHONPATH=src pytest tests/unit/test_evolution_store_persistent.py -v` 全部通过

---

### U4. 记忆接入 Agent 循环

**Goal**: 将 MemoryRetriever 注入 ReActEngine，执行前检索相关上下文注入 system_prompt，执行后写入轨迹摘要到 EpisodicMemory。

**Requirements**: R2

**Dependencies**: U1, U2

**Files**:
- Modify: `src/agentkit/core/react.py` — execute() 新增 `memory_retriever` 参数，执行前检索上下文
- Modify: `src/agentkit/core/config_driven.py` — 根据 config.memory 自动实例化三层记忆，注入 ReActEngine
- Modify: `src/agentkit/core/base.py` — BaseAgent 新增 `use_memory_retriever()` 方法
- Modify: `src/agentkit/server/app.py` — create_app() 中初始化 Memory 组件
- Test: `tests/unit/test_memory_integration.py`

**Approach**:
1. `ReActEngine.__init__` 新增 `memory_retriever: MemoryRetriever | None = None`
2. `execute()` 开始前：调用 `memory_retriever.get_context_string(task_input)` 获取相关记忆
3. 将记忆上下文追加到 system_prompt 的末尾（`## Relevant Past Experience` 段落）
4. `execute()` 结束后：将执行轨迹摘要写入 EpisodicMemory
5. `ConfigDrivenAgent.__init__` 根据 `config.memory` 配置自动创建 WorkingMemory/EpisodicMemory/MemoryRetriever
6. `create_app()` 中从 ServerConfig 读取 memory 配置，初始化 Memory 组件

**Patterns to follow**: `src/agentkit/memory/retriever.py` 的 `MemoryRetriever` 接口

**Test scenarios**:
- Happy path: 执行任务时检索到相关历史记忆，注入 system_prompt
- Happy path: 任务完成后轨迹摘要写入 EpisodicMemory
- Edge case: 无记忆时正常执行（memory_retriever=None）
- Edge case: 记忆检索失败时不影响任务执行
- Integration: 连续执行两个相似任务，第二个任务能检索到第一个的记忆

**Verification**: `PYTHONPATH=src pytest tests/unit/test_memory_integration.py -v` 全部通过

---

### U5. EpisodicMemory 向量检索实现

**Goal**: 实现 EpisodicMemory 的 pgvector cosine distance 排序，替代当前的时间衰减排序，支持语义相似度检索。

**Requirements**: R4

**Dependencies**: U4

**Files**:
- Modify: `src/agentkit/memory/episodic.py` — 实现 pgvector 向量检索
- Create: `src/agentkit/memory/embedder.py` — Embedder 接口 + OpenAIEmbedder 实现
- Test: `tests/unit/test_episodic_vector_search.py`

**Approach**:
1. 新增 `Embedder` 抽象基类：`embed(text: str) -> list[float]`
2. 新增 `OpenAIEmbedder`：调用 OpenAI Embeddings API（text-embedding-3-small）
3. `EpisodicMemory.store()` 中调用 embedder 生成 embedding，存入 pgvector Vector 列
4. `EpisodicMemory.search()` 中实现 cosine distance 排序，与时间衰减混合：`score = alpha * cosine_similarity + (1-alpha) * time_decay`
5. 默认 `alpha=0.7`（语义相似度权重更高），可通过配置调整
6. `retrieve(key)` 方法实现：先 embed query，再按 cosine distance 排序

**Patterns to follow**: `src/agentkit/memory/episodic.py` 的现有接口

**Test scenarios**:
- Happy path: 存入 3 条记忆，用语义相似查询检索到最相关的
- Happy path: 时间衰减 + 语义相似度混合排序
- Edge case: embedder 不可用时降级到纯时间衰减排序
- Edge case: 空查询返回空结果
- Error path: pgvector 扩展未安装时的错误提示

**Verification**: `PYTHONPATH=src pytest tests/unit/test_episodic_vector_search.py -v` 全部通过

---

### U6. LLM 反思器

**Goal**: 新增 LLMReflector，通过 LLM 分析执行轨迹生成结构化反思。保留 RuleBasedReflector 作为降级方案。

**Requirements**: R3

**Dependencies**: U2, U3

**Files**:
- Create: `src/agentkit/evolution/llm_reflector.py` — LLMReflector 类
- Modify: `src/agentkit/evolution/reflector.py` — 重命名为 RuleBasedReflector，保持接口兼容
- Modify: `src/agentkit/evolution/lifecycle.py` — EvolutionMixin 支持 reflector 类型选择
- Modify: `src/agentkit/skills/base.py` — EvolutionConfig 新增 `reflector_type` 字段
- Test: `tests/unit/test_llm_reflector.py`

**Approach**:
1. `LLMReflector` 接收 `ExecutionTrace`，构建反思 Prompt（包含轨迹详情 + 质量评分）
2. 调用 LLM Gateway 生成结构化反思（失败根因/成功模式/改进建议）
3. 输出与 `Reflection` 数据类兼容（outcome/quality_score/patterns/insights/suggestions）
4. `EvolutionMixin` 新增 `reflector_type` 配置：`llm`（默认）/ `rule` / `auto`（LLM 优先，失败降级到 rule）
5. LLM 反思使用辅助模型（非主模型），降低成本
6. `EvolutionConfig` 新增 `reflector_type` 和 `auxiliary_model` 字段，与 EvolutionMixin 对齐

**Patterns to follow**: `src/agentkit/evolution/reflector.py` 的 `Reflector` 接口和 `Reflection` 数据类

**Test scenarios**:
- Happy path: LLM 分析执行轨迹，生成包含 insights 和 suggestions 的 Reflection
- Happy path: auto 模式下 LLM 失败时降级到 RuleBasedReflector
- Edge case: 执行轨迹为空时返回默认 Reflection
- Edge case: LLM 返回非结构化文本时的解析容错
- Integration: EvolutionMixin 使用 LLMReflector 完成完整进化流程

**Verification**: `PYTHONPATH=src pytest tests/unit/test_llm_reflector.py -v` 全部通过

---

### U7. 技能编排

**Goal**: 复用 PipelineEngine 实现 Skill 编排，支持将多个 Skill 串联为 Pipeline 执行。

**Requirements**: R6

**Dependencies**: U4

**Files**:
- Create: `src/agentkit/skills/pipeline.py` — SkillPipeline 适配层
- Modify: `src/agentkit/skills/registry.py` — 新增 pipeline 注册和查询
- Modify: `src/agentkit/server/routes/skills.py` — 新增 pipeline API 端点
- Test: `tests/unit/test_skill_pipeline.py`

**Approach**:
1. `SkillPipeline` 类：封装 PipelineEngine，将 Skill 包装为 Pipeline Step
2. 每个 Skill 在 Pipeline 中作为一个 Step，输入为上一步的输出
3. 支持顺序执行、条件分支（根据 Skill 输出决定下一步）、并行执行
4. Pipeline 定义格式复用 `orchestrator/pipeline_schema.py` 的 PipelineConfig
5. SkillPipeline 可通过 YAML 定义或编程式构建
6. SkillRegistry 新增 `register_pipeline()` 和 `get_pipeline()` 方法

**Patterns to follow**: `src/agentkit/orchestrator/pipeline_engine.py` 的 PipelineEngine 接口

**Test scenarios**:
- Happy path: 3 个 Skill 顺序执行，输出正确传递
- Happy path: 条件分支 — 根据 Skill A 的输出决定执行 Skill B 还是 Skill C
- Edge case: Pipeline 中某个 Skill 失败时，后续 Skill 不执行
- Edge case: 空 Pipeline（0 个 Skill）直接返回空结果
- Integration: 通过 API 提交 Pipeline 任务，查询执行状态

**Verification**: `PYTHONPATH=src pytest tests/unit/test_skill_pipeline.py -v` 全部通过

---

### U8. SKILL.md 格式 + 渐进式分层

**Goal**: 支持 SKILL.md 格式的技能定义，实现渐进式分层加载（Level 0 概要 / Level 1 完整 / Level 2 参考）。

**Requirements**: R8

**Dependencies**: U6

**Files**:
- Create: `src/agentkit/skills/skill_md.py` — SKILL.md 解析器
- Modify: `src/agentkit/skills/loader.py` — 新增 `load_from_skill_md()` 方法
- Modify: `src/agentkit/skills/base.py` — SkillConfig 新增 `skill_md_path` 和 `disclosure_level` 字段
- Modify: `src/agentkit/cli/skill.py` — 新增 `skill create` 命令生成 SKILL.md 模板
- Test: `tests/unit/test_skill_md.py`

**Approach**:
1. SKILL.md 格式：YAML frontmatter（name/description/intent/quality_gate/execution_mode）+ Markdown 正文（trigger/steps/pitfalls/verification）
2. 解析器提取 frontmatter 生成 SkillConfig，正文按标题分段存储
3. 渐进式分层：
   - Level 0：frontmatter 中的 name + description（~50 tokens，常驻加载）
   - Level 1：完整正文（按需加载，当 IntentRouter 匹配到该技能时）
   - Level 2：references/ 和 templates/ 目录（深度加载，技能执行时）
4. SkillLoader 新增 `load_from_skill_md(path)` 方法
5. CLI `skill create` 生成 SKILL.md 模板文件

**Patterns to follow**: `src/agentkit/skills/loader.py` 的 `load_from_file()` 方法

**Test scenarios**:
- Happy path: 解析 SKILL.md 文件，生成正确的 SkillConfig
- Happy path: Level 0 只加载 name + description
- Happy path: Level 1 加载完整步骤
- Edge case: frontmatter 缺失时使用默认值
- Edge case: Markdown 正文缺少标准段落时的容错处理
- Integration: SkillLoader 从 SKILL.md 加载技能，注册到 SkillRegistry

**Verification**: `PYTHONPATH=src pytest tests/unit/test_skill_md.py -v` 全部通过

---

### U9. 上下文压缩与 Prompt 缓存

**Goal**: 实现上下文压缩（长会话自动压缩历史消息）和 Prompt 缓存（会话内 Prompt 不重复渲染）。

**Requirements**: R9

**Dependencies**: U4

**Files**:
- Create: `src/agentkit/core/compressor.py` — ContextCompressor 类
- Modify: `src/agentkit/prompts/template.py` — 新增 `render_cached()` 方法和缓存机制
- Modify: `src/agentkit/core/react.py` — execute() 中注入压缩逻辑
- Test: `tests/unit/test_context_compressor.py`

**Approach**:
1. `ContextCompressor`：当消息总 Token 数超过阈值（默认 4000）时，调用 LLM 将历史消息压缩为摘要
2. 压缩策略：保留最近 N 条消息 + 早期消息的 LLM 摘要
3. `PromptTemplate.render_cached()`：对相同变量输入返回缓存结果，变量变化时重新渲染
4. 缓存 key 基于 variables 的 hash，缓存存储在 PromptTemplate 实例上
5. ReActEngine.execute() 中在每次 LLM 调用前检查消息长度，超阈值则压缩

**Patterns to follow**: Hermes Agent 的上下文压缩机制（LLM 摘要 + 缓存快照）

**Test scenarios**:
- Happy path: 10 条历史消息压缩为摘要 + 最近 3 条
- Happy path: 压缩后 Token 数低于阈值
- Happy path: 相同变量输入命中 PromptTemplate 缓存
- Edge case: 压缩后仍超阈值时递归压缩
- Edge case: LLM 压缩调用失败时保留原始消息

**Verification**: `PYTHONPATH=src pytest tests/unit/test_context_compressor.py -v` 全部通过

---

### U10. 可观测性

**Goal**: 实现结构化日志、metrics 端点和增强健康检查。

**Requirements**: R10

**Dependencies**: U2

**Files**:
- Create: `src/agentkit/core/logging.py` — 结构化日志配置
- Create: `src/agentkit/server/routes/metrics.py` — /api/v1/metrics 端点
- Modify: `src/agentkit/server/routes/health.py` — 增强健康检查（Redis/PG/LLM/AgentPool 状态）
- Modify: `src/agentkit/server/app.py` — 注册 metrics 路由，初始化结构化日志
- Test: `tests/unit/test_observability.py`

**Approach**:
1. 结构化日志：使用 Python `structlog`，JSON 格式输出，包含 trace_id/agent_name/skill_name
2. Metrics 端点：`GET /api/v1/metrics` 返回任务计数/成功率/平均耗时/Token 用量/Agent 池状态
3. 增强健康检查：`GET /api/v1/health` 返回 Redis 连通性/PG 连通性/LLM Provider 可用性/AgentPool 大小
4. Metrics 数据从 TaskStore（Redis）和 EvolutionStore（SQLite）聚合
5. 健康检查中 LLM 可用性通过轻量级 ping（发送空请求验证 API Key 有效）

**Patterns to follow**: `src/agentkit/server/routes/health.py` 的现有健康检查接口

**Test scenarios**:
- Happy path: 结构化日志输出 JSON 格式，包含 trace_id
- Happy path: /api/v1/metrics 返回正确的任务计数和成功率
- Happy path: /api/v1/health 检查 Redis/PG/LLM 状态
- Edge case: Redis 不可用时健康检查返回 degraded 状态
- Edge case: 无任务数据时 metrics 返回零值

**Verification**: `PYTHONPATH=src pytest tests/unit/test_observability.py -v` 全部通过

---

## Phased Delivery

### Phase A: 基础设施（U1, U2, U3）

无外部依赖的底层能力，为后续所有单元提供基础。

- U1: TaskStore 持久化 → 进程重启不丢状态
- U2: 执行轨迹记录器 → 为反思和可观测性提供数据
- U3: EvolutionStore 持久化 → 进化可追溯

### Phase B: 核心能力（U4, U5, U6, U7）

依赖 Phase A 的核心升级，建立飞轮闭环。

- U4: 记忆接入 Agent 循环 → 跨会话上下文延续
- U5: Episodic 向量检索 → 语义记忆召回
- U6: LLM 反思器 → 真正的反思能力
- U7: 技能编排 → 多技能 Pipeline

### Phase C: 增强（U8, U9, U10）

提升用户体验和生产就绪度。

- U8: SKILL.md 格式 → 开放标准兼容
- U9: 上下文压缩与缓存 → Token 成本优化
- U10: 可观测性 → 生产运维

---

## Risks & Mitigations

| 风险 | 影响 | 缓解措施 |
|------|------|---------|
| LLM 反思器增加 API 调用成本 | 中 | 使用辅助模型（更便宜），auto 模式降级到规则 |
| pgvector 向量检索延迟 | 中 | 混合排序（语义+时间衰减），限制返回数量 |
| 记忆注入增加 Prompt Token | 中 | Token 预算管理，超预算时截断 |
| 技能编排增加复杂度 | 低 | 复用现有 PipelineEngine，渐进式引入 |
| SQLite EvolutionStore 并发写入 | 低 | 单写多读模式，写操作加锁 |
| 向后兼容性破坏 | 高 | 所有新参数默认 None，不改变现有行为 |

---

## System-Wide Impact

- **API 兼容性**：所有新增参数默认 None，现有 API 调用无需修改
- **配置变更**：`agentkit.yaml` 新增 `task_store`/`memory`/`evolution` 配置块，均为可选
- **部署变更**：Redis 从可选变为推荐（TaskStore 默认后端），已在 docker-compose 中配置
- **依赖变更**：新增 `structlog`（可观测性），`pgvector` 向量检索需要 pgvector 扩展
- **测试变更**：新增 10 个测试文件，约 50+ 测试用例

---

## Open Questions

1. **Embedder 选型**：OpenAI Embeddings vs 本地模型（如 sentence-transformers）？建议默认 OpenAI，可选本地
2. **LLM 反思的辅助模型**：使用主模型还是更便宜的模型？建议默认使用主模型，可通过 `auxiliary_model` 配置
3. **SKILL.md 与现有 YAML 的共存策略**：是否需要迁移工具？建议双格式共存，SkillLoader 自动识别

---

## Sources & Research

- Hermes Agent 官方文档: https://hermes-agent.nousresearch.com/docs/developer-guide/architecture
- GEPA 论文: ICLR 2026 Oral "Reflective Prompt Evolution Can Outperform Reinforcement Learning"
- Hermes Agent 记忆系统: https://hermes-agent.ai/blog/hermes-agent-memory-system
- Hermes Curator: https://hermes-agent.nousresearch.com/docs/user-guide/features/curator
- AgentKit 现有计划: `docs/plans/006-refactor-agentkit-v2-phase2-plan.md`