fischer-agentkit/docs/plans/2026-06-05-001-feat-agentki...

---
title: "feat: fischer-agentkit TDD 验证与补全计划"
type: feat
status: active
date: 2026-06-05
origin: geo/docs/plans/2026-06-04-010-refactor-unified-agent-framework-plan.md
execution_posture: tdd
---

## Summary

对 fischer-agentkit 已实现的 6 大模块进行 TDD 验证：先补全缺失的单元测试覆盖（6 个零覆盖模块 + 4 个薄弱模块），再修复测试中发现的问题（pgvector 向量检索、datetime 弃用、测试基础设施缺失），最后补全 4 个集成测试验证端到端流程。采用真实 Redis/PostgreSQL 服务进行测试，确保验证结果可靠。

## Problem Frame

fischer-agentkit 的 6 大模块（Core/Tools/Memory/Evolution/Orchestrator/MCP）代码已全部实现，189 个现有测试全部通过，但存在以下结构性问题：

1. **6 个模块完全无测试**：dispatcher、registry、mcp/server、evolution_store、agent_tool、prompts — 代码存在但行为未验证
2. **4 个模块测试薄弱**：working_memory（无 Redis mock）、episodic_memory（仅测试衰减公式）、mcp/client（仅间接测试）、handoff（仅无 Redis 场景）
3. **集成测试完全缺失**：`tests/integration/` 目录为空，无法验证端到端流程
4. **代码质量问题**：21 处 `datetime.utcnow()` 弃用警告、EpisodicMemory pgvector 向量检索标记为 TODO
5. **测试基础设施缺失**：无 conftest.py、fixture 在 4 个文件中重复定义

这些问题意味着：虽然代码"能跑"，但核心功能（任务调度、Agent 注册、MCP 服务端、进化持久化）从未被自动化测试验证过。

---

## Requirements

本计划追溯至原始需求文档的以下条目：

| 需求 ID | 需求描述 | 验证状态 |
|---------|---------|---------|
| R2 | BaseAgent 统一生命周期 | 部分验证（缺 dispatcher/registry） |
| R6 | Tool 三种类型（Function/Agent/MCP） | AgentTool 未验证 |
| R7 | ToolRegistry 注册发现版本管理 | 基本验证 |
| R8 | MCP Server 暴露 Agent 能力 | **未验证** |
| R9 | MCP Client 调用外部工具 | 仅间接验证 |
| R11 | Working Memory Redis | **未验证** |
| R12 | Episodic Memory 向量检索 | **未验证**（TODO） |
| R13 | Semantic Memory RAG+Graph | 基本验证 |
| R14 | 混合检索策略 | 部分验证 |
| R15 | 经验积累自动记录 | 部分验证 |
| R20 | Handoff 任务转交 | 仅无 Redis 场景 |
| R22 | 事件驱动替代轮询 | **未实现**（不在本计划范围） |

---

## Key Technical Decisions

KTD1. **真实服务测试策略**：单元测试和集成测试均使用真实 Redis 和 PostgreSQL（pgvector）服务，通过 docker-compose 启动测试专用容器。理由：fakeredis 不支持所有 Redis 命令（如 Pub/Sub 的完整行为），mock SQLAlchemy session 无法验证真实 SQL 和 pgvector 查询。真实服务测试更可靠，且 GEO 项目已有 pgvector/pg15 和 Redis 7 的 docker 镜像。

KTD2. **测试基础设施先行**：先创建 conftest.py 提取公共 fixture，再逐模块补全测试。理由：4 个文件重复定义 `_make_task()` 等辅助函数，不统一会导致后续测试继续重复。

KTD3. **TDD 红绿循环**：每个模块先写测试定义期望行为（可能失败），再修复代码使测试通过。对于 EpisodicMemory 的 pgvector TODO，先写测试定义向量检索的期望行为，再实现 cosine distance 排序。

KTD4. **datetime.utcnow() 统一修复**：在补全测试之前先修复 21 处弃用警告，避免新测试继承技术债务。替换为 `datetime.now(timezone.utc)`，与项目后期代码（agent_tool.py、pipeline_engine.py 等）保持一致。

KTD5. **测试风格统一为类式**：新测试统一使用 `class TestXxx` 分组 + `async def` 方法（依赖 `asyncio_mode = "auto"`），不再使用 `@pytest.mark.asyncio` 装饰器。与项目较新的测试文件风格一致。

---

## High-Level Technical Design

### 测试分层架构

```mermaid
flowchart TB
    subgraph Infrastructure["测试基础设施"]
        DC["docker-compose.test.yml<br/>Redis 7 + pgvector/pg15"]
        Conf["conftest.py<br/>公共 fixture"]
        Env[".env.test<br/>测试环境变量"]
    end

    subgraph UnitTests["单元测试 (tests/unit/)"]
        P0["P0: 零覆盖模块<br/>dispatcher, registry<br/>mcp/server, evolution_store<br/>agent_tool, prompts"]
        P1["P1: 薄弱模块<br/>working_memory, episodic_memory<br/>mcp/client, handoff"]
        Fix["代码修复<br/>datetime.utcnow, pgvector TODO"]
    end

    subgraph IntegrationTests["集成测试 (tests/integration/)"]
        AL["test_agent_lifecycle.py<br/>完整生命周期"]
        TC["test_tool_composition.py<br/>工具组合端到端"]
        EL["test_evolution_loop.py<br/>进化闭环"]
        MR["test_mcp_roundtrip.py<br/>MCP 往返"]
    end

    Infrastructure --> UnitTests
    P0 --> Fix
    P1 --> Fix
    UnitTests --> IntegrationTests
```

### 测试执行流程

```mermaid
stateDiagram-v2
    [*] --> SetupInfra: 启动测试容器
    SetupInfra --> WriteTests: 编写测试（RED）
    WriteTests --> RunTests: 运行测试
    RunTests --> FixCode: 测试失败 → 修复代码（GREEN）
    FixCode --> RunTests: 重新运行
    RunTests --> WriteTests: 全部通过 → 下一模块
    RunTests --> Integration: 单元测试全部通过
    Integration --> [*]: 集成测试通过
```

---

## Implementation Units

### U1. 测试基础设施搭建

**Goal:** 创建 docker-compose 测试配置、conftest.py 公共 fixture、.env.test 环境变量，为后续 TDD 提供可靠基础。

**Requirements:** R2, R11, R12

**Dependencies:** 无

**Files:**
- `fischer-agentkit/docker-compose.test.yml`（新建）
- `fischer-agentkit/.env.test`（新建）
- `fischer-agentkit/tests/conftest.py`（新建）
- `fischer-agentkit/tests/unit/conftest.py`（新建）
- `fischer-agentkit/tests/integration/conftest.py`（新建）
- `fischer-agentkit/pyproject.toml`（修改：添加 pytest-docker 或 testcontainers 依赖）

**Approach:**

1. 创建 `docker-compose.test.yml`，包含 Redis 7 和 pgvector/pg15 服务，端口避免与 GEO 项目冲突（Redis 6379 → 6381，PostgreSQL 5432 → 5434）
2. 创建 `.env.test` 声明测试环境变量
3. 创建 `tests/conftest.py`，提取公共 fixture：
   - `make_task()` — 构建 TaskMessage
   - `make_result()` — 构建 TaskResult
   - `redis_client` — 连接测试 Redis 的 async fixture
   - `pg_session_factory` — 连接测试 PostgreSQL 的 async fixture
   - `clean_redis` — 每个测试前清空 Redis
   - `clean_db` — 每个测试前清空数据库
4. 创建 `tests/unit/conftest.py` 和 `tests/integration/conftest.py`，分别提供各自层级的 fixture
5. 在 pyproject.toml 的 dev 依赖中添加 `pytest-docker>=0.4` 或 `testcontainers[postgres,redis]>=4.0`
6. 添加 `pytest` 配置的 `env_file = ".env.test"` 或通过 fixture 管理环境变量

**Patterns to follow:** GEO 项目的 `geo/docker-compose.yml` 中 Redis 和 PostgreSQL 的配置模式

**Test scenarios:**
- docker-compose.test.yml 启动后 Redis 可连接并执行 PING
- docker-compose.test.yml 启动后 PostgreSQL 可连接并查询 pgvector 扩展
- conftest.py 的 redis_client fixture 可正常执行 set/get 操作
- conftest.py 的 pg_session_factory fixture 可创建表并执行查询
- make_task() fixture 生成的 TaskMessage 可被 BaseAgent.execute() 接受
- clean_redis fixture 在测试间正确隔离数据

**Verification:** `docker compose -f docker-compose.test.yml up -d && pytest tests/ -v` 全部通过

---

### U2. datetime.utcnow() 弃用修复

**Goal:** 将项目中 21 处 `datetime.utcnow()` 全部替换为 `datetime.now(timezone.utc)`，消除 DeprecationWarning。

**Requirements:** 代码质量（非功能性需求）

**Dependencies:** 无（可与 U1 并行）

**Files:**
- `fischer-agentkit/src/agentkit/core/protocol.py`（7 处）
- `fischer-agentkit/src/agentkit/memory/base.py`（1 处）
- `fischer-agentkit/src/agentkit/memory/working.py`（3 处）
- `fischer-agentkit/src/agentkit/memory/episodic.py`（2 处）
- `fischer-agentkit/src/agentkit/evolution/reflector.py`（1 处）
- `fischer-agentkit/src/agentkit/evolution/lifecycle.py`（2 处）
- `fischer-agentkit/tests/unit/test_memory_system.py`（4 处）
- `fischer-agentkit/tests/unit/test_protocol.py`（1 处）

**Approach:**

1. 在每个文件的 import 区域添加 `from datetime import timezone`（如尚未导入）
2. 将 `datetime.utcnow()` 替换为 `datetime.now(timezone.utc)`
3. 将 `field(default_factory=lambda: datetime.utcnow())` 替换为 `field(default_factory=lambda: datetime.now(timezone.utc))`
4. 运行现有 189 个测试确认无回归

**Execution note:** 先运行测试确认当前基线通过，修改后重新运行确认无回归且无 DeprecationWarning。

**Patterns to follow:** 项目中已正确使用 `datetime.now(timezone.utc)` 的文件：agent_tool.py、pipeline_engine.py、registry.py、dispatcher.py、base.py

**Test scenarios:**
- 修改后 `pytest tests/ -W error::DeprecationWarning` 无弃用警告
- 修改后 189 个现有测试全部通过
- TaskMessage.from_dict() 反序列化包含 UTC 时间戳的 JSON 正确

**Verification:** `pytest tests/ -W error::DeprecationWarning -v` 全部通过，零警告

---

### U3. 零覆盖模块单元测试（Core 层）

**Goal:** 为 `core/dispatcher.py` 和 `core/registry.py` 补全单元测试，验证任务调度和 Agent 注册发现的核心逻辑。

**Requirements:** R2

**Dependencies:** U1

**Files:**
- `fischer-agentkit/tests/unit/test_dispatcher.py`（新建）
- `fischer-agentkit/tests/unit/test_registry.py`（新建）

**Approach:**

1. **test_dispatcher.py**：
   - 测试 TaskDispatcher 在本地模式（无 Redis）下的任务分发
   - 测试任务队列的 FIFO 顺序
   - 测试任务重试逻辑
   - 测试任务取消
   - 测试回调机制
   - 测试并发分发（多个任务同时入队）
2. **test_registry.py**：
   - 测试 AgentRegistry 动态注册新 AgentType
   - 测试注册重复 AgentType 的处理
   - 测试 get_available_agent 的轮询策略
   - 测试 Agent 心跳和过期清理
   - 测试按能力查询 Agent

**Execution note:** TDD — 先写测试定义期望行为，运行确认结果，再根据需要调整。

**Patterns to follow:** 现有 test_base_agent.py 的类式测试风格

**Test scenarios:**

test_dispatcher.py:
- 本地模式分发任务到指定 Agent，返回 TaskResult
- 任务队列按 FIFO 顺序处理
- 任务执行失败时重试指定次数
- 取消正在等待的任务返回取消状态
- 回调函数在任务完成后被调用
- 多个任务并发分发，结果正确返回

test_registry.py:
- 动态注册新 AgentType 不报错
- 注册重复 AgentType 覆盖旧配置
- get_available_agent 轮询策略返回不同 Agent
- Agent 心跳超时后从可用列表移除
- 按 supported_tasks 查询匹配的 Agent
- 空注册表查询返回空列表

**Verification:** `pytest tests/unit/test_dispatcher.py tests/unit/test_registry.py -v` 全部通过

---

### U4. 零覆盖模块单元测试（Tools + Prompts 层）

**Goal:** 为 `tools/agent_tool.py` 和 `prompts/` 模块补全单元测试，验证 Agent 包装为 Tool 和模板渲染的逻辑。

**Requirements:** R6

**Dependencies:** U1

**Files:**
- `fischer-agentkit/tests/unit/test_agent_tool.py`（新建）
- `fischer-agentkit/tests/unit/test_prompt_template.py`（新建）
- `fischer-agentkit/tests/unit/test_prompt_section.py`（新建）

**Approach:**

1. **test_agent_tool.py**：
   - 测试 AgentTool 的输入映射（input_mapping）
   - 测试 AgentTool 的输出映射（output_mapping）
   - 测试 AgentTool 通过 Dispatcher 分发任务
   - 测试 AgentTool 超时处理
   - 测试 AgentTool 的 schema 自动生成
2. **test_prompt_template.py**：
   - 测试 PromptTemplate 变量替换 `${key}`
   - 测试缺失变量的处理
   - 测试模板渲染结果
3. **test_prompt_section.py**：
   - 测试 PromptSection 的条件渲染
   - 测试多 Section 组合渲染

**Execution note:** TDD — AgentTool 的轮询等待机制（1 秒间隔）在测试中需要 mock asyncio.sleep 加速。

**Patterns to follow:** 现有 test_tool_composition.py 的 Mock 模式

**Test scenarios:**

test_agent_tool.py:
- AgentTool 正确映射输入参数到 TaskMessage
- AgentTool 正确映射 TaskResult 到输出 dict
- AgentTool 通过 Dispatcher 分发任务并等待结果
- AgentTool 超时后抛出 TimeoutError
- AgentTool 的 input_schema 从 input_mapping 推断
- AgentTool 的 output_schema 从 output_mapping 推断

test_prompt_template.py:
- `${name}` 变量替换为实际值
- 缺失变量时抛出 KeyError 或保留原始占位符
- 多变量模板正确替换所有变量
- 空模板渲染返回空字符串

test_prompt_section.py:
- 条件为 True 的 Section 包含在渲染结果中
- 条件为 False 的 Section 排除在渲染结果外
- 多 Section 按顺序组合渲染
- 无条件 Section 始终包含

**Verification:** `pytest tests/unit/test_agent_tool.py tests/unit/test_prompt_template.py tests/unit/test_prompt_section.py -v` 全部通过

---

### U5. 零覆盖模块单元测试（MCP Server + Evolution Store）

**Goal:** 为 `mcp/server.py` 和 `evolution/evolution_store.py` 补全单元测试，验证 MCP 服务端点和进化持久化逻辑。

**Requirements:** R8, R15

**Dependencies:** U1

**Files:**
- `fischer-agentkit/tests/unit/test_mcp_server.py`（新建）
- `fischer-agentkit/tests/unit/test_evolution_store.py`（新建）

**Approach:**

1. **test_mcp_server.py**：
   - 使用 `httpx.AsyncClient` + `ASGITransport` 测试 FastAPI 端点
   - 测试 `/tools/list` 返回 ToolRegistry 中注册的工具
   - 测试 `/tools/call` 调用指定工具并返回结果
   - 测试调用不存在的工具返回错误
   - 测试 `/resources/read` 端点
   - 测试 JSON-RPC 2.0 协议格式
2. **test_evolution_store.py**：
   - 测试 EvolutionStore 记录进化变更
   - 测试按 agent_name 查询变更历史
   - 测试回滚操作
   - 测试变更状态管理（active/rolled_back）

**Execution note:** MCP Server 测试使用 httpx.AsyncClient + ASGITransport，无需启动真实 HTTP 服务器。

**Patterns to follow:** 现有 test_mcp_transport.py 的 httpx_mock 模式；FastAPI 官方推荐的 AsyncClient 测试模式

**Test scenarios:**

test_mcp_server.py:
- `/tools/list` 返回已注册工具的名称和 schema
- `/tools/call` 调用 FunctionTool 返回正确结果
- `/tools/call` 调用不存在的工具返回 JSON-RPC 错误
- `/resources/read` 返回可用资源列表
- JSON-RPC 2.0 请求格式正确解析
- JSON-RPC 2.0 响应包含 jsonrpc/version/id 字段

test_evolution_store.py:
- 记录 prompt 类型的进化变更
- 记录 strategy 类型的进化变更
- 按 agent_name 查询返回该 Agent 的所有变更
- 回滚操作将变更状态设为 rolled_back
- 回滚后查询返回 rolled_back 状态
- 空存储查询返回空列表

**Verification:** `pytest tests/unit/test_mcp_server.py tests/unit/test_evolution_store.py -v` 全部通过

---

### U6. 薄弱模块补强测试（Memory 层）

**Goal:** 为 WorkingMemory 和 EpisodicMemory 补全真实服务测试，验证 Redis 存取和 pgvector 向量检索。实现 EpisodicMemory 的 pgvector cosine distance 排序（当前标记为 TODO）。

**Requirements:** R11, R12, R14

**Dependencies:** U1, U2

**Files:**
- `fischer-agentkit/tests/unit/test_working_memory.py`（新建）
- `fischer-agentkit/tests/unit/test_episodic_memory.py`（新建）
- `fischer-agentkit/tests/unit/test_memory_retriever.py`（新建）
- `fischer-agentkit/src/agentkit/memory/episodic.py`（修改：实现 pgvector cosine distance）

**Approach:**

1. **test_working_memory.py**（真实 Redis）：
   - 测试 store/retrieve/delete 基本操作
   - 测试 TTL 自动过期
   - 测试 get_context() 格式化输出
   - 测试不同 Agent 实例的 key 隔离
   - 测试 Redis 连接失败时的降级处理
2. **test_episodic_memory.py**（真实 pgvector）：
   - 测试 store 写入任务经验并生成 embedding
   - 测试 search 按语义相似度检索（pgvector cosine distance）
   - 测试 search 按时间衰减排序
   - 测试 search 混合排序（语义 + 时间衰减）
   - 测试 delete 删除指定记录
3. **test_memory_retriever.py**：
   - 测试三层记忆并行检索
   - 测试权重融合排序
   - 测试 Token 预算管理（截断超限结果）
4. **实现 pgvector cosine distance**：
   - 在 `episodic.py` 的 search 方法中，将 `# TODO: 使用 pgvector 的 cosine distance 排序` 替换为真实的 pgvector 查询
   - 使用 `embedding <=> :query_embedding` 操作符进行 cosine distance 排序
   - 结合时间衰减因子：最终得分 = 语义相似度 × 时间衰减

**Execution note:** TDD — 先写 EpisodicMemory 的向量检索测试（期望行为），运行确认失败（TODO 未实现），再实现 pgvector cosine distance 排序使测试通过。

**Patterns to follow:** GEO 项目的 `backend/app/services/knowledge/retriever.py` 中 HybridRetriever 的 RRF 融合排序模式

**Test scenarios:**

test_working_memory.py:
- store + retrieve 返回相同值
- TTL 过期后 retrieve 返回空
- get_context() 返回格式化的上下文字符串
- 不同 Agent 的 working_memory key 互不干扰
- delete 后 retrieve 返回空
- 存储复杂对象（嵌套 dict）正确序列化/反序列化

test_episodic_memory.py:
- store 写入记录后可按 agent_name 查询
- search 按语义相似度返回最相关记录（cosine distance）
- search 时间衰减：近期记录排名高于远期
- search 混合排序：语义相似 + 时间衰减综合排序
- delete 删除指定 ID 的记录
- 空 store 的 search 返回空列表

test_memory_retriever.py:
- 并行查询三层记忆，结果合并
- 按权重融合排序（向量 0.5 + 关键词 0.2 + 图谱 0.3）
- Token 预算管理：总 token 不超过预算时保留所有结果
- Token 预算管理：超过预算时截断低分结果
- 某层记忆无结果时不影响其他层

**Verification:** `pytest tests/unit/test_working_memory.py tests/unit/test_episodic_memory.py tests/unit/test_memory_retriever.py -v` 全部通过，且 EpisodicMemory 的 TODO 已实现

---

### U7. 薄弱模块补强测试（MCP Client + Handoff）

**Goal:** 为 MCPClient 和 HandoffManager 补全测试，验证 MCP 客户端工具发现和 Handoff 的 Redis Pub/Sub 机制。

**Requirements:** R9, R20

**Dependencies:** U1, U2

**Files:**
- `fischer-agentkit/tests/unit/test_mcp_client.py`（新建）
- `fischer-agentkit/tests/unit/test_handoff.py`（新建）

**Approach:**

1. **test_mcp_client.py**：
   - 测试 MCPClient 通过 Transport 连接远程 Server
   - 测试 list_tools() 返回工具列表
   - 测试 call_tool() 调用远程工具
   - 测试 MCPClient 直接 HTTP 模式（无 Transport）
   - 测试连接失败时的错误处理
2. **test_handoff.py**（真实 Redis）：
   - 测试 HandoffManager 通过 Redis Pub/Sub 发送转交请求
   - 测试目标 Agent 监听并接收转交消息
   - 测试转交消息携带上下文
   - 测试无 Redis 时的降级处理（本地模式）
   - 测试多个 Agent 同时监听不同频道

**Execution note:** Handoff 测试使用真实 Redis Pub/Sub，需要确保测试间频道隔离。

**Patterns to follow:** 现有 test_mcp_transport.py 的 HTTP mock 模式

**Test scenarios:**

test_mcp_client.py:
- 通过 Transport 调用 list_tools 返回工具名称列表
- 通过 Transport 调用 call_tool 返回工具执行结果
- 直接 HTTP 模式调用工具
- 连接不存在的 Server 抛出连接错误
- call_tool 传入无效参数返回错误响应
- JSON-RPC 2.0 请求格式正确

test_handoff.py:
- send_handoff 通过 Redis Pub/Sub 发送消息
- listen_for_handoffs 接收到转交消息
- 转交消息包含 source_agent、target_agent、context、reason
- 无 Redis 时 HandoffManager 降级为本地调用
- 不同 Agent 监听不同频道互不干扰
- 转交消息序列化/反序列化正确

**Verification:** `pytest tests/unit/test_mcp_client.py tests/unit/test_handoff.py -v` 全部通过

---

### U8. 集成测试补全

**Goal:** 补全 4 个集成测试文件，验证端到端流程：Agent 完整生命周期、工具组合、进化闭环、MCP 往返。

**Requirements:** R2, R6, R8, R9, R15, R16, R18, R20

**Dependencies:** U1, U3, U4, U5, U6, U7

**Files:**
- `fischer-agentkit/tests/integration/test_agent_lifecycle.py`（新建）
- `fischer-agentkit/tests/integration/test_tool_composition.py`（新建）
- `fischer-agentkit/tests/integration/test_evolution_loop.py`（新建）
- `fischer-agentkit/tests/integration/test_mcp_roundtrip.py`（新建）

**Approach:**

1. **test_agent_lifecycle.py**：
   - 启动 Agent → 发送任务 → 接收结果 → 停止 Agent 的完整流程
   - 验证 on_task_start/on_task_complete 钩子调用顺序
   - 验证任务失败时 on_task_failed 钩子触发
   - 验证 Memory 在任务执行中的存取
2. **test_tool_composition.py**：
   - SequentialChain：两个工具顺序执行，前一个输出作为后一个输入
   - ParallelFanOut：三个工具并行执行，结果合并
   - DynamicSelector：LLM 根据任务选择工具
   - AgentTool：将 Agent 包装为 Tool 并调用
3. **test_evolution_loop.py**：
   - 反思 → 优化 → A/B 测试 → 应用/回滚 完整闭环
   - 验证 EvolutionStore 持久化进化记录
   - 验证 A/B 测试效果提升后自动应用
   - 验证 A/B 测试效果下降后自动回滚
4. **test_mcp_roundtrip.py**：
   - 启动 MCP Server → MCP Client 连接 → list_tools → call_tool → 结果返回
   - 验证 Server 暴露的 Tool 与 ToolRegistry 一致
   - 验证 Client 调用的结果与直接调用 Tool 一致

**Execution note:** 集成测试使用真实 Redis 和 PostgreSQL，标记为 `@pytest.mark.integration`，可通过 `pytest -m "not integration"` 跳过。

**Patterns to follow:** 现有 test_u8_geo_integration.py 的端到端测试模式

**Test scenarios:**

test_agent_lifecycle.py:
- ConfigDrivenAgent 从 YAML 加载 → 启动 → 执行任务 → 返回 TaskResult → 停止
- BaseAgent 生命周期钩子按序调用：start → on_task_start → handle_task → on_task_complete → stop
- 任务执行失败时 on_task_failed 触发，TaskResult 状态为 FAILED
- Agent 执行任务时 WorkingMemory 自动存取上下文
- Agent 执行任务后 EpisodicMemory 自动记录经验

test_tool_composition.py:
- SequentialChain 顺序执行两个 FunctionTool，第二个接收第一个的输出
- ParallelFanOut 并行执行三个 FunctionTool，结果合并
- DynamicSelector 根据 LLM 判断选择合适工具
- AgentTool 包装 Agent 并通过 Dispatcher 分发任务

test_evolution_loop.py:
- 执行 5 次任务后 Reflector 生成反思
- PromptOptimizer 从成功案例生成 few-shot 示例
- ABTester 分流测试，实验组效果提升后自动应用
- ABTester 分流测试，实验组效果下降后自动回滚
- EvolutionStore 记录所有变更，支持查询历史

test_mcp_roundtrip.py:
- MCP Server 启动后 Client 可 list_tools
- Client call_tool 返回与直接调用 Tool 相同的结果
- Server 暴露的工具列表与 ToolRegistry 注册一致
- JSON-RPC 2.0 协议端到端正确

**Verification:** `pytest tests/integration/ -v` 全部通过

---

## Scope Boundaries

### In Scope

- 补全 6 个零覆盖模块的单元测试
- 补强 4 个薄弱模块的单元测试
- 实现 EpisodicMemory 的 pgvector cosine distance 排序（当前 TODO）
- 修复 21 处 datetime.utcnow() 弃用警告
- 创建测试基础设施（docker-compose.test.yml、conftest.py）
- 补全 4 个集成测试文件

### Deferred for Later

- MIPROv2 多目标 Prompt 优化（R16 高级特性）
- Bayesian Optimization 策略调优（R17 高级特性）
- Pipeline 事件驱动替代轮询（R22）
- MCP Client 自动发现远程工具并注册到本地 ToolRegistry（R9 高级特性）
- MCP Server SSE 流式响应（R8 高级特性）
- EvolutionMixin 与 BaseAgent 的自动集成（R15 增强）
- AgentTool 轮询改为事件驱动
- CI/CD 配置
- mypy/pyright 类型检查配置

### Outside This Project's Identity

- GEO 业务系统的完整迁移（U8）
- 前端 Agent 管理界面
- A2A Protocol 支持

---

## Risks & Dependencies

| Risk | Impact | Mitigation |
|------|--------|------------|
| pgvector cosine distance 实现可能需要调整表结构 | 需要数据库迁移 | 先写测试定义期望行为，实现时如需迁移则同步更新 docker-compose.test.yml 的 init-db 脚本 |
| 真实服务测试需要 docker 环境 | CI 环境可能无 docker | 提供 pytest marker 标记集成测试，无 docker 时可跳过；单元测试中 Redis/PG 相关测试也用 marker 标记 |
| AgentTool 轮询等待在测试中耗时 | 测试执行缓慢 | mock asyncio.sleep 加速，或设置短超时 |
| 现有测试可能因 conftest.py 重构而受影响 | fixture 命名冲突 | conftest.py 使用新 fixture 名，逐步迁移旧测试 |
| pytest-httpx 未在 pyproject.toml 中声明 | 依赖缺失 | 在 U1 中添加到 dev 依赖 |

---

## System-Wide Impact

- **测试执行时间**：从当前 ~3 秒增加到预计 ~30 秒（真实服务 + 集成测试）
- **开发依赖**：新增 pytest-docker/testcontainers、pytest-httpx
- **Docker 需求**：开发环境需安装 Docker 以运行测试
- **CI/CD**：后续需配置 GitHub Actions 运行 docker-compose 启动测试服务