27 KiB

Raw Blame History

title	status	created	plan_type	depth	origin	branch
feat: AgentKit Phase 3 — 持久化·记忆·进化·技能·可观测性升级	active	2026-06-06	feat	deep	Hermes Agent 对比分析 + 5 大问题评估	feat/agentkit-phase3-upgrade

AgentKit Phase 3 升级计划

Summary

基于 Hermes Agent 对标分析和 AgentKit 现状评估，本计划解决 5 个核心问题：无法持久运行、记忆系统未接入、进化架构断层、技能能力不足、缺乏可观测性。覆盖 P0+P1+P2 共 10 项升级，分 3 个交付阶段实施，保持主干代码不变，在 feat/agentkit-phase3-upgrade 分支开发。

Problem Frame

AgentKit 当前是一个"有框架但未接入"的状态：

持久化断层：docker-compose 配置了 Redis + PostgreSQL，但 TaskStore 纯内存，进程重启丢失所有状态
记忆断层：三层记忆架构设计完整，但 Agent 循环中零记忆调用，ReActEngine 不读写记忆
进化断层：EvolutionConfig 定义了配置但 EvolutionMixin 不读取，Reflector 基于硬编码规则，A/B 测试数据伪造
技能断层：Skill 是纯数据容器，无自动创建/编排/策展能力，不支持 SKILL.md 开放标准
可观测性断层：无结构化日志、无 metrics、无执行轨迹导出

Hermes Agent 的核心创新是"执行轨迹 → LLM 反思 → 技能沉淀 → 复用加速"的闭环飞轮。AgentKit 需要建立类似但适配企业场景的进化能力。

Requirements

ID	需求	优先级	来源
R1	TaskStore 持久化到 Redis/PG，进程重启不丢状态	P0	持久运行评估
R2	记忆系统接入 Agent 循环，执行前检索上下文，执行后写入轨迹	P0	记忆架构评估
R3	LLM 驱动反思器替换硬编码 Reflector	P0	进化架构评估
R4	EpisodicMemory 实现 pgvector 向量检索	P1	记忆架构评估
R5	执行轨迹记录器，为反思和可观测性提供数据	P1	进化+可观测性
R6	技能编排/Pipeline 能力	P1	技能完备性评估
R7	EvolutionStore 持久化	P1	进化架构评估
R8	SKILL.md 格式 + 渐进式分层	P2	技能完备性评估
R9	上下文压缩与 Prompt 缓存	P2	Token 成本优化
R10	可观测性（结构化日志 + metrics + 健康检查增强）	P2	生产运维

Scope Boundaries

In Scope

10 项升级（R1-R10），分 3 个交付阶段
保持现有 API 向后兼容
分支开发模式，不修改主干

Out of Scope

多平台消息网关（Telegram/Discord/Slack 等）——定位差异，AgentKit 是 AI 引擎而非个人 Agent
子代理并行执行——需要更复杂的调度架构，留待 Phase 4
技能自动创建 + Curator——依赖 LLM 反思器和执行轨迹，留待 Phase 4
agentskills.io 技能市场——需要社区基础设施，留待 Phase 4
SemanticMemory 的 RAG/知识图谱后端实现——依赖外部服务，当前保持适配器模式

Deferred to Follow-Up Work

RateLimiter 迁移到 Redis 分布式限流
多 worker 模式下的状态共享
优雅关闭（SIGTERM 信号处理）
用户建模（user_id + 偏好跟踪）

Key Technical Decisions

KTD1: TaskStore 持久化策略 — Redis 优先

决策：TaskStore 默认使用 Redis 后端，InMemoryTaskStore 仅用于开发/测试。

理由：

docker-compose 已配置 Redis，基础设施就绪
TaskStore 已有 RedisTaskStore 实现（server/task_store.py），只需设为默认
Redis 天然支持 TTL，与任务过期清理需求一致
避免引入新的存储依赖

替代方案：PostgreSQL 后端——更持久但延迟更高，适合归档而非活跃任务状态。

KTD2: 记忆集成方式 — MemoryRetriever 注入 ReActEngine

决策：在 ReActEngine.execute() 中注入 MemoryRetriever | None 参数，执行前检索相关上下文注入 system_prompt，执行后写入轨迹到 EpisodicMemory。

理由：

ReActEngine 是所有执行模式的底层引擎，在此层集成覆盖面最广
MemoryRetriever 已实现三层并行检索 + 权重融合，无需重写
注入方式而非继承方式，保持 ReActEngine 的独立性

替代方案：在 ConfigDrivenAgent 层集成——更简单但只覆盖 ConfigDrivenAgent，不覆盖直接使用 ReActEngine 的场景。

KTD3: 反思器策略 — LLM-in-the-loop + 规则降级

决策：新增 LLMReflector，通过 LLM 分析执行轨迹生成反思。保留 RuleBasedReflector（当前实现）作为降级方案，LLM 不可用时自动切换。

理由：

GEPA 的核心洞见是"自然语言反思比数值奖励更有效"，这需要 LLM 级别的反思
企业场景需要降级策略，LLM 不可用时不能完全失去反思能力
不直接使用 DSPy/GEPA 框架——AgentKit 已有 LLMGateway，无需引入新依赖

替代方案：集成 DSPy + GEPA——更强大但引入重依赖，且 AgentKit 的定位不需要 GEPA 的完整进化流水线。

KTD4: 执行轨迹存储 — SQLite 本地 + 可选 PG

决策：执行轨迹默认存储在本地 SQLite（~/.agentkit/traces/），可选配置 PostgreSQL 后端用于大规模部署。

理由：

与 Hermes Agent 一致（SQLite FTS5），轻量级
单机部署无需 PG，降低使用门槛
PG 后端用于多实例部署场景

KTD5: 技能编排 — 复用现有 PipelineEngine

决策：技能编排复用 orchestrator/pipeline_engine.py 的 PipelineEngine，新增 SkillPipeline 适配层将 Skill 包装为 Pipeline Step。

理由：

PipelineEngine 已实现顺序/并行/条件执行，功能完整
避免重复造轮子，只需一个适配层
Pipeline YAML 格式已定义，用户可声明式编排技能

KTD6: SKILL.md 格式 — YAML 元数据 + Markdown 正文

决策：SKILL.md 采用 YAML frontmatter + Markdown 正文的混合格式，兼容 agentskills.io 标准。

理由：

YAML frontmatter 机器可读（解析元数据），Markdown 正文人机可读（描述技能步骤）
与现有 YAML 配置格式兼容，迁移成本低
agentskills.io 标准使用纯 Markdown，YAML frontmatter 是其超集

High-Level Technical Design

进化飞轮架构

graph LR
    A[任务执行] --> B[执行轨迹记录]
    B --> C[LLM 反思分析]
    C --> D{质量达标?}
    D -->|否| E[Prompt 优化]
    D -->|是| F[技能沉淀]
    E --> G[A/B 测试]
    G --> H{统计显著?}
    H -->|是| I[应用/回滚]
    H -->|否| J[继续收集样本]
    F --> K[技能库]
    K -->|复用| A
    I --> K

记忆集成数据流

sequenceDiagram
    participant Client
    participant Agent as ConfigDrivenAgent
    participant Engine as ReActEngine
    participant Retriever as MemoryRetriever
    participant Episodic as EpisodicMemory

    Client->>Agent: handle_task(task)
    Agent->>Retriever: get_context(task.input_data)
    Retriever->>Episodic: search(similar tasks)
    Episodic-->>Retriever: relevant memories
    Retriever-->>Agent: context string
    Agent->>Engine: execute(messages + context)
    Engine-->>Agent: result + trace
    Agent->>Episodic: store(trace summary)
    Agent-->>Client: TaskResult

三阶段交付依赖

graph TD
    subgraph Phase A - 基础设施
        U1[U1: TaskStore 持久化]
        U2[U2: 执行轨迹记录器]
        U3[U3: EvolutionStore 持久化]
    end
    subgraph Phase B - 核心能力
        U4[U4: 记忆接入 Agent 循环]
        U5[U5: Episodic 向量检索]
        U6[U6: LLM 反思器]
        U7[U7: 技能编排]
    end
    subgraph Phase C - 增强
        U8[U8: SKILL.md 格式]
        U9[U9: 上下文压缩与缓存]
        U10[U10: 可观测性]
    end
    U1 --> U4
    U2 --> U4
    U2 --> U6
    U3 --> U6
    U4 --> U5
    U6 --> U8

Implementation Units

U1. TaskStore 持久化到 Redis

Goal: 将 TaskStore 默认后端从内存切换到 Redis，确保进程重启后任务状态不丢失。

Requirements: R1

Dependencies: 无

Files:

Modify: src/agentkit/server/task_store.py — 将 create_task_store() 默认使用 Redis 后端
Modify: src/agentkit/server/app.py — create_app() 中根据配置选择 TaskStore 后端
Modify: src/agentkit/server/config.py — 新增 task_store_backend 配置项
Modify: src/agentkit/cli/main.py — serve 命令传递 task_store 配置
Test: tests/unit/test_task_store_redis.py

Approach:

RedisTaskStore 已存在于 task_store.py，验证其功能完整性
create_task_store() 工厂函数增加 backend 参数，默认 redis
ServerConfig 新增 task_store 配置块（backend/redis_url/ttl/max_records）
create_app() 从 ServerConfig 读取配置，创建对应 TaskStore
InMemoryTaskStore 保留用于测试，通过 backend: memory 显式启用

Patterns to follow: src/agentkit/server/task_store.py 中 RedisTaskStore 的现有实现

Test scenarios:

Happy path: 创建任务 → 重启模拟（关闭 Redis 连接再重连）→ 查询任务仍存在
Edge case: Redis 不可用时降级到 InMemoryTaskStore 并打 warning 日志
Edge case: TTL 过期后任务自动清理
Error path: Redis 连接失败时的错误处理和降级
Integration: serve 命令启动后提交任务，查询任务状态

Verification: PYTHONPATH=src pytest tests/unit/test_task_store_redis.py -v 全部通过

U2. 执行轨迹记录器

Goal: 在 ReActEngine 执行过程中记录完整的执行轨迹（每步动作、输入输出、耗时、Token 用量），为反思和可观测性提供数据。

Requirements: R5

Dependencies: 无

Files:

Create: src/agentkit/core/trace.py — TraceStep + ExecutionTrace 数据类 + TraceRecorder
Modify: src/agentkit/core/react.py — execute() 中注入 TraceRecorder，记录每步
Modify: src/agentkit/core/protocol.py — TaskResult 新增 trace 字段
Test: tests/unit/test_trace_recorder.py

Approach:

定义 TraceStep（step/action/tool_name/input/output/duration_ms/tokens_used/error）和 ExecutionTrace（task_id/agent_name/skill_name/steps/total_duration/total_tokens/outcome/quality_score）
TraceRecorder 类：start_trace()、record_step()、end_trace()、get_trace()
ReActEngine.execute() 新增 trace_recorder: TraceRecorder | None = None 参数
每次工具调用和 LLM 调用后调用 record_step()
TaskResult 新增可选 trace: ExecutionTrace | None 字段
轨迹默认存储在内存中（单次请求生命周期），后续 U3 持久化

Patterns to follow: src/agentkit/core/react.py 中 ReActStep 和 ReActResult 的现有数据结构

Test scenarios:

Happy path: 执行 3 步 ReAct 循环，验证轨迹包含 3 个 TraceStep
Happy path: 工具调用记录 tool_name/input/output/duration
Edge case: 无工具调用的纯 LLM 响应，轨迹只有 1 步
Error path: 工具调用失败，TraceStep.error 非空
Integration: ConfigDrivenAgent 通过 ReActEngine 执行任务，TaskResult 包含 trace

Verification: PYTHONPATH=src pytest tests/unit/test_trace_recorder.py -v 全部通过

U3. EvolutionStore 持久化

Goal: 将进化事件从内存迁移到 SQLite 持久化存储，支持进化历史查询和回滚。

Requirements: R7

Dependencies: 无

Files:

Modify: src/agentkit/evolution/evolution_store.py — 新增 SQLite 后端，替换内存存储
Create: src/agentkit/evolution/models.py — SQLAlchemy ORM 模型（EvolutionEvent/SkillVersion/ABTestResult）
Test: tests/unit/test_evolution_store_persistent.py

Approach:

定义 SQLAlchemy ORM 模型：EvolutionEvent（id/agent_name/event_type/trace_id/reflection_id/proposal_id/status/created_at）、SkillVersion（id/skill_name/version/content/parent_version/created_at）、ABTestResult（id/test_id/variant/score/sample_count/created_at）
EvolutionStore 新增 backend 参数，默认 sqlite（路径 ~/.agentkit/evolution.db）
record()/query()/rollback() 方法操作 SQLite
保留内存后端用于测试
首次运行自动创建表结构

Patterns to follow: src/agentkit/evolution/evolution_store.py 的现有接口

Test scenarios:

Happy path: 记录进化事件 → 关闭连接 → 重新打开 → 查询到事件
Happy path: 记录技能版本 → 查询版本历史
Edge case: 空数据库首次查询返回空列表
Error path: SQLite 文件不可写时的错误处理
Integration: EvolutionMixin.evolve_after_task() 写入 EvolutionStore

Verification: PYTHONPATH=src pytest tests/unit/test_evolution_store_persistent.py -v 全部通过

U4. 记忆接入 Agent 循环

Goal: 将 MemoryRetriever 注入 ReActEngine，执行前检索相关上下文注入 system_prompt，执行后写入轨迹摘要到 EpisodicMemory。

Requirements: R2

Dependencies: U1, U2

Files:

Modify: src/agentkit/core/react.py — execute() 新增 memory_retriever 参数，执行前检索上下文
Modify: src/agentkit/core/config_driven.py — 根据 config.memory 自动实例化三层记忆，注入 ReActEngine
Modify: src/agentkit/core/base.py — BaseAgent 新增 use_memory_retriever() 方法
Modify: src/agentkit/server/app.py — create_app() 中初始化 Memory 组件
Test: tests/unit/test_memory_integration.py

Approach:

ReActEngine.__init__ 新增 memory_retriever: MemoryRetriever | None = None
execute() 开始前：调用 memory_retriever.get_context_string(task_input) 获取相关记忆
将记忆上下文追加到 system_prompt 的末尾（## Relevant Past Experience 段落）
execute() 结束后：将执行轨迹摘要写入 EpisodicMemory
ConfigDrivenAgent.__init__ 根据 config.memory 配置自动创建 WorkingMemory/EpisodicMemory/MemoryRetriever
create_app() 中从 ServerConfig 读取 memory 配置，初始化 Memory 组件

Patterns to follow: src/agentkit/memory/retriever.py 的 MemoryRetriever 接口

Test scenarios:

Happy path: 执行任务时检索到相关历史记忆，注入 system_prompt
Happy path: 任务完成后轨迹摘要写入 EpisodicMemory
Edge case: 无记忆时正常执行（memory_retriever=None）
Edge case: 记忆检索失败时不影响任务执行
Integration: 连续执行两个相似任务，第二个任务能检索到第一个的记忆

Verification: PYTHONPATH=src pytest tests/unit/test_memory_integration.py -v 全部通过

U5. EpisodicMemory 向量检索实现

Goal: 实现 EpisodicMemory 的 pgvector cosine distance 排序，替代当前的时间衰减排序，支持语义相似度检索。

Requirements: R4

Dependencies: U4

Files:

Modify: src/agentkit/memory/episodic.py — 实现 pgvector 向量检索
Create: src/agentkit/memory/embedder.py — Embedder 接口 + OpenAIEmbedder 实现
Test: tests/unit/test_episodic_vector_search.py

Approach:

新增 Embedder 抽象基类：embed(text: str) -> list[float]
新增 OpenAIEmbedder：调用 OpenAI Embeddings API（text-embedding-3-small）
EpisodicMemory.store() 中调用 embedder 生成 embedding，存入 pgvector Vector 列
EpisodicMemory.search() 中实现 cosine distance 排序，与时间衰减混合：score = alpha * cosine_similarity + (1-alpha) * time_decay
默认 alpha=0.7（语义相似度权重更高），可通过配置调整
retrieve(key) 方法实现：先 embed query，再按 cosine distance 排序

Patterns to follow: src/agentkit/memory/episodic.py 的现有接口

Test scenarios:

Happy path: 存入 3 条记忆，用语义相似查询检索到最相关的
Happy path: 时间衰减 + 语义相似度混合排序
Edge case: embedder 不可用时降级到纯时间衰减排序
Edge case: 空查询返回空结果
Error path: pgvector 扩展未安装时的错误提示

Verification: PYTHONPATH=src pytest tests/unit/test_episodic_vector_search.py -v 全部通过

U6. LLM 反思器

Goal: 新增 LLMReflector，通过 LLM 分析执行轨迹生成结构化反思。保留 RuleBasedReflector 作为降级方案。

Requirements: R3

Dependencies: U2, U3

Files:

Create: src/agentkit/evolution/llm_reflector.py — LLMReflector 类
Modify: src/agentkit/evolution/reflector.py — 重命名为 RuleBasedReflector，保持接口兼容
Modify: src/agentkit/evolution/lifecycle.py — EvolutionMixin 支持 reflector 类型选择
Modify: src/agentkit/skills/base.py — EvolutionConfig 新增 reflector_type 字段
Test: tests/unit/test_llm_reflector.py

Approach:

LLMReflector 接收 ExecutionTrace，构建反思 Prompt（包含轨迹详情 + 质量评分）
调用 LLM Gateway 生成结构化反思（失败根因/成功模式/改进建议）
输出与 Reflection 数据类兼容（outcome/quality_score/patterns/insights/suggestions）
EvolutionMixin 新增 reflector_type 配置：llm（默认）/ rule / auto（LLM 优先，失败降级到 rule）
LLM 反思使用辅助模型（非主模型），降低成本
EvolutionConfig 新增 reflector_type 和 auxiliary_model 字段，与 EvolutionMixin 对齐

Patterns to follow: src/agentkit/evolution/reflector.py 的 Reflector 接口和 Reflection 数据类

Test scenarios:

Happy path: LLM 分析执行轨迹，生成包含 insights 和 suggestions 的 Reflection
Happy path: auto 模式下 LLM 失败时降级到 RuleBasedReflector
Edge case: 执行轨迹为空时返回默认 Reflection
Edge case: LLM 返回非结构化文本时的解析容错
Integration: EvolutionMixin 使用 LLMReflector 完成完整进化流程

Verification: PYTHONPATH=src pytest tests/unit/test_llm_reflector.py -v 全部通过

U7. 技能编排

Goal: 复用 PipelineEngine 实现 Skill 编排，支持将多个 Skill 串联为 Pipeline 执行。

Requirements: R6

Dependencies: U4

Files:

Create: src/agentkit/skills/pipeline.py — SkillPipeline 适配层
Modify: src/agentkit/skills/registry.py — 新增 pipeline 注册和查询
Modify: src/agentkit/server/routes/skills.py — 新增 pipeline API 端点
Test: tests/unit/test_skill_pipeline.py

Approach:

SkillPipeline 类：封装 PipelineEngine，将 Skill 包装为 Pipeline Step
每个 Skill 在 Pipeline 中作为一个 Step，输入为上一步的输出
支持顺序执行、条件分支（根据 Skill 输出决定下一步）、并行执行
Pipeline 定义格式复用 orchestrator/pipeline_schema.py 的 PipelineConfig
SkillPipeline 可通过 YAML 定义或编程式构建
SkillRegistry 新增 register_pipeline() 和 get_pipeline() 方法

Patterns to follow: src/agentkit/orchestrator/pipeline_engine.py 的 PipelineEngine 接口

Test scenarios:

Happy path: 3 个 Skill 顺序执行，输出正确传递
Happy path: 条件分支 — 根据 Skill A 的输出决定执行 Skill B 还是 Skill C
Edge case: Pipeline 中某个 Skill 失败时，后续 Skill 不执行
Edge case: 空 Pipeline（0 个 Skill）直接返回空结果
Integration: 通过 API 提交 Pipeline 任务，查询执行状态

Verification: PYTHONPATH=src pytest tests/unit/test_skill_pipeline.py -v 全部通过

U8. SKILL.md 格式 + 渐进式分层

Goal: 支持 SKILL.md 格式的技能定义，实现渐进式分层加载（Level 0 概要 / Level 1 完整 / Level 2 参考）。

Requirements: R8

Dependencies: U6

Files:

Create: src/agentkit/skills/skill_md.py — SKILL.md 解析器
Modify: src/agentkit/skills/loader.py — 新增 load_from_skill_md() 方法
Modify: src/agentkit/skills/base.py — SkillConfig 新增 skill_md_path 和 disclosure_level 字段
Modify: src/agentkit/cli/skill.py — 新增 skill create 命令生成 SKILL.md 模板
Test: tests/unit/test_skill_md.py

Approach:

SKILL.md 格式：YAML frontmatter（name/description/intent/quality_gate/execution_mode）+ Markdown 正文（trigger/steps/pitfalls/verification）
解析器提取 frontmatter 生成 SkillConfig，正文按标题分段存储
渐进式分层：
- Level 0：frontmatter 中的 name + description（~50 tokens，常驻加载）
- Level 1：完整正文（按需加载，当 IntentRouter 匹配到该技能时）
- Level 2：references/ 和 templates/ 目录（深度加载，技能执行时）
SkillLoader 新增 load_from_skill_md(path) 方法
CLI skill create 生成 SKILL.md 模板文件

Patterns to follow: src/agentkit/skills/loader.py 的 load_from_file() 方法

Test scenarios:

Happy path: 解析 SKILL.md 文件，生成正确的 SkillConfig
Happy path: Level 0 只加载 name + description
Happy path: Level 1 加载完整步骤
Edge case: frontmatter 缺失时使用默认值
Edge case: Markdown 正文缺少标准段落时的容错处理
Integration: SkillLoader 从 SKILL.md 加载技能，注册到 SkillRegistry

Verification: PYTHONPATH=src pytest tests/unit/test_skill_md.py -v 全部通过

U9. 上下文压缩与 Prompt 缓存

Goal: 实现上下文压缩（长会话自动压缩历史消息）和 Prompt 缓存（会话内 Prompt 不重复渲染）。

Requirements: R9

Dependencies: U4

Files:

Create: src/agentkit/core/compressor.py — ContextCompressor 类
Modify: src/agentkit/prompts/template.py — 新增 render_cached() 方法和缓存机制
Modify: src/agentkit/core/react.py — execute() 中注入压缩逻辑
Test: tests/unit/test_context_compressor.py

Approach:

ContextCompressor：当消息总 Token 数超过阈值（默认 4000）时，调用 LLM 将历史消息压缩为摘要
压缩策略：保留最近 N 条消息 + 早期消息的 LLM 摘要
PromptTemplate.render_cached()：对相同变量输入返回缓存结果，变量变化时重新渲染
缓存 key 基于 variables 的 hash，缓存存储在 PromptTemplate 实例上
ReActEngine.execute() 中在每次 LLM 调用前检查消息长度，超阈值则压缩

Patterns to follow: Hermes Agent 的上下文压缩机制（LLM 摘要 + 缓存快照）

Test scenarios:

Happy path: 10 条历史消息压缩为摘要 + 最近 3 条
Happy path: 压缩后 Token 数低于阈值
Happy path: 相同变量输入命中 PromptTemplate 缓存
Edge case: 压缩后仍超阈值时递归压缩
Edge case: LLM 压缩调用失败时保留原始消息

Verification: PYTHONPATH=src pytest tests/unit/test_context_compressor.py -v 全部通过

U10. 可观测性

Goal: 实现结构化日志、metrics 端点和增强健康检查。

Requirements: R10

Dependencies: U2

Files:

Create: src/agentkit/core/logging.py — 结构化日志配置
Create: src/agentkit/server/routes/metrics.py — /api/v1/metrics 端点
Modify: src/agentkit/server/routes/health.py — 增强健康检查（Redis/PG/LLM/AgentPool 状态）
Modify: src/agentkit/server/app.py — 注册 metrics 路由，初始化结构化日志
Test: tests/unit/test_observability.py

Approach:

结构化日志：使用 Python structlog，JSON 格式输出，包含 trace_id/agent_name/skill_name
Metrics 端点：GET /api/v1/metrics 返回任务计数/成功率/平均耗时/Token 用量/Agent 池状态
增强健康检查：GET /api/v1/health 返回 Redis 连通性/PG 连通性/LLM Provider 可用性/AgentPool 大小
Metrics 数据从 TaskStore（Redis）和 EvolutionStore（SQLite）聚合
健康检查中 LLM 可用性通过轻量级 ping（发送空请求验证 API Key 有效）

Patterns to follow: src/agentkit/server/routes/health.py 的现有健康检查接口

Test scenarios:

Happy path: 结构化日志输出 JSON 格式，包含 trace_id
Happy path: /api/v1/metrics 返回正确的任务计数和成功率
Happy path: /api/v1/health 检查 Redis/PG/LLM 状态
Edge case: Redis 不可用时健康检查返回 degraded 状态
Edge case: 无任务数据时 metrics 返回零值

Verification: PYTHONPATH=src pytest tests/unit/test_observability.py -v 全部通过

Phased Delivery

Phase A: 基础设施（U1, U2, U3）

无外部依赖的底层能力，为后续所有单元提供基础。

U1: TaskStore 持久化 → 进程重启不丢状态
U2: 执行轨迹记录器 → 为反思和可观测性提供数据
U3: EvolutionStore 持久化 → 进化可追溯

Phase B: 核心能力（U4, U5, U6, U7）

依赖 Phase A 的核心升级，建立飞轮闭环。

U4: 记忆接入 Agent 循环 → 跨会话上下文延续
U5: Episodic 向量检索 → 语义记忆召回
U6: LLM 反思器 → 真正的反思能力
U7: 技能编排 → 多技能 Pipeline

Phase C: 增强（U8, U9, U10）

提升用户体验和生产就绪度。

U8: SKILL.md 格式 → 开放标准兼容
U9: 上下文压缩与缓存 → Token 成本优化
U10: 可观测性 → 生产运维

Risks & Mitigations

风险	影响	缓解措施
LLM 反思器增加 API 调用成本	中	使用辅助模型（更便宜），auto 模式降级到规则
pgvector 向量检索延迟	中	混合排序（语义+时间衰减），限制返回数量
记忆注入增加 Prompt Token	中	Token 预算管理，超预算时截断
技能编排增加复杂度	低	复用现有 PipelineEngine，渐进式引入
SQLite EvolutionStore 并发写入	低	单写多读模式，写操作加锁
向后兼容性破坏	高	所有新参数默认 None，不改变现有行为

System-Wide Impact

API 兼容性：所有新增参数默认 None，现有 API 调用无需修改
配置变更：agentkit.yaml 新增 task_store/memory/evolution 配置块，均为可选
部署变更：Redis 从可选变为推荐（TaskStore 默认后端），已在 docker-compose 中配置
依赖变更：新增 structlog（可观测性），pgvector 向量检索需要 pgvector 扩展
测试变更：新增 10 个测试文件，约 50+ 测试用例

Open Questions

Embedder 选型：OpenAI Embeddings vs 本地模型（如 sentence-transformers）？建议默认 OpenAI，可选本地
LLM 反思的辅助模型：使用主模型还是更便宜的模型？建议默认使用主模型，可通过 auxiliary_model 配置
SKILL.md 与现有 YAML 的共存策略：是否需要迁移工具？建议双格式共存，SkillLoader 自动识别

Sources & Research

Hermes Agent 官方文档: https://hermes-agent.nousresearch.com/docs/developer-guide/architecture
GEPA 论文: ICLR 2026 Oral "Reflective Prompt Evolution Can Outperform Reinforcement Learning"
Hermes Agent 记忆系统: https://hermes-agent.ai/blog/hermes-agent-memory-system
Hermes Curator: https://hermes-agent.nousresearch.com/docs/user-guide/features/curator
AgentKit 现有计划: docs/plans/006-refactor-agentkit-v2-phase2-plan.md

27 KiB Raw Blame History Unescape Escape

AgentKit Phase 3 升级计划

Summary

Problem Frame

Requirements

Scope Boundaries

In Scope

Out of Scope

Deferred to Follow-Up Work

Key Technical Decisions

KTD1: TaskStore 持久化策略 — Redis 优先

KTD2: 记忆集成方式 — MemoryRetriever 注入 ReActEngine

KTD3: 反思器策略 — LLM-in-the-loop + 规则降级

KTD4: 执行轨迹存储 — SQLite 本地 + 可选 PG

KTD5: 技能编排 — 复用现有 PipelineEngine

KTD6: SKILL.md 格式 — YAML 元数据 + Markdown 正文

High-Level Technical Design

进化飞轮架构

记忆集成数据流

三阶段交付依赖

Implementation Units

U1. TaskStore 持久化到 Redis

U2. 执行轨迹记录器

U3. EvolutionStore 持久化

U4. 记忆接入 Agent 循环

U5. EpisodicMemory 向量检索实现

U6. LLM 反思器

U7. 技能编排

U8. SKILL.md 格式 + 渐进式分层

U9. 上下文压缩与 Prompt 缓存

U10. 可观测性

Phased Delivery

Phase A: 基础设施（U1, U2, U3）

Phase B: 核心能力（U4, U5, U6, U7）

Phase C: 增强（U8, U9, U10）

Risks & Mitigations

System-Wide Impact

Open Questions

Sources & Research

27 KiB

Raw Blame History