fischer-agentkit/docs/plans/2026-06-09-017-feat-agentki...

---
title: "feat: AgentKit Multi-Agent Marketplace 架构演进（修订版）"
status: active
date: 2026-06-09
depth: deep
origin: docs/brainstorms/2026-06-09-clawith-research-prompt.md
revision: v2
revision_reason: "基于 2026-06-10 代码增量分析（+36863 行，161 文件），修正原方案中与现有代码重叠/冲突/不适用的部分"
---

# AgentKit Multi-Agent Marketplace 架构演进方案（修订版）

## 修订说明

原方案（v1）基于 Phase 1-8 代码编写，但代码库在 2026-06-10 有大量新增（161 files, +36863 lines），包含多个与原方案重叠的实现。本次修订基于最新代码逐项评估，修正不适用的部分。

---

## 原方案问题评估

### 问题 1：U6 Plan-and-Execute 引擎 — 与现有代码大量重叠

**现有实现**：
- `core/goal_planner.py`（594 行）：GoalPlanner 已实现目标→结构化执行计划的分解，含规则/模板+LLM 双模式
- `core/plan_executor.py`（518 行）：PlanExecutor 已实现按计划逐步执行
- `core/plan_checker.py`（739 行）：PlanChecker 已实现计划检查和复盘
- `core/plan_schema.py`（148 行）：ExecutionPlan/PlanStep/PlanStepStatus/SkillGap 数据模型
- `orchestrator/reflection.py`（370 行）：PipelineReflector + PipelineReplanner 已实现反思-重规划

**原方案 U6 的问题**：计划"新增 `core/plan_exec.py`"，但 `core/plan_executor.py` 已存在且功能完整。GoalPlanner + PlanExecutor + PipelineReplanner 三者组合已覆盖 Plan-and-Execute 的核心流程。

**修正**：U6 不再新建引擎，而是将现有 GoalPlanner/PlanExecutor/PipelineReplanner 封装为 `plan_exec` execution_mode 的执行引擎适配器。

### 问题 2：U7 Reflexion 引擎 — 与现有代码部分重叠

**现有实现**：
- `orchestrator/reflection.py`：PipelineReflector 已实现 LLM 反思分析
- `evolution/reflector.py` + `evolution/llm_reflector.py`：LLMReflector 已实现反思
- `evolution/lifecycle.py`：EvolutionMixin 已实现反思→优化→A/B测试闭环

**原方案 U7 的问题**：Reflexion 的核心逻辑（反思+重试）在 EvolutionMixin 和 PipelineReflector 中已有实现，但缺少"执行中自我评估+重试"的循环。

**修正**：U7 简化为在 ReActEngine 基础上增加 Evaluate→Reflect→Retry 循环节点，复用 LLMReflector。

### 问题 3：U1 Concierge — 与现有 Chat 系统重叠

**现有实现**：
- `chat/skill_routing.py`（168 行）：SkillRoutingResult + parse_skill_prefix() + route_to_skill()
- `cli/chat.py`（422 行）：CLI Chat 界面，含 @skill: 前缀路由
- `server/routes/chat.py`：REST + WebSocket Chat API
- `session/store.py`：Session/Message 管理

**原方案 U1 的问题**：Concierge 的"统一入口+对话上下文+路由"功能在现有 Chat 系统中已部分实现。skill_routing.py 已实现 @skill: 前缀路由，chat.py 已实现对话上下文管理。

**修正**：U1 不再新建 Concierge 模块，而是在现有 Chat 系统上扩展 CostAwareRouter 能力。Concierge 的对话管理复用 SessionManager，路由扩展复用 skill_routing.py。

### 问题 4：U2 CostAwareRouter — 与现有路由重叠

**现有实现**：
- `chat/skill_routing.py`：已实现 Skill 路由（@skill: 前缀 + 关键词匹配）
- `router/intent.py`：IntentRouter 三级路由（关键词+LLM）
- `skills/registry.py`（259 行）：SkillRegistry 已实现 Skill 查找和匹配

**原方案 U2 的问题**：Layer 0 的 Skill 匹配和 Layer 1 的 LLM 分类在现有路由中已有实现。

**修正**：U2 简化为在 skill_routing.py 基础上增加 complexity 评估和拍卖触发逻辑，不新建独立模块。

### 问题 5：U10 Soul — 与现有 Memory 系统重叠

**现有实现**：
- `tools/memory_tool.py`（117 行）：MemoryTool 已实现 SOUL/USER/MEMORY/DAILY 四层记忆操作
- `memory/profile.py`（294 行）：MemoryProfile 已实现记忆配置和注入

**原方案 U10 的问题**：Soul 的 CRUD 已通过 MemoryTool 实现，不需要新建 SoulManager。

**修正**：U10 简化为扩展 MemoryTool 的 SOUL section 支持动态演变（版本号+反思触发更新），不新建 identity/ 模块。

### 问题 6：拍卖机制的适用性存疑

**原方案 KTD1**：用拍卖机制替代中央编排器。

**问题**：
1. 拍卖机制需要每个 Agent 都能"竞标"——但当前 ConfigDrivenAgent 没有 `bid()` 方法，需要给所有 Agent 增加竞标能力
2. Economy of Minds 论文的环境是"弱 Agent 群体"，而 AgentKit 的 Agent 是"强 Agent + 工具"，场景不同
3. 拍卖机制的"财富积累"概念在单用户场景下意义不大——谁给 Agent 发工资？
4. 拍卖增加了系统复杂度，但实际收益不确定——大多数场景下，基于能力的路由（OrganizationContext.find_best_agent）比拍卖更直接有效

**修正**：拍卖机制降级为可选实验特性。默认使用"能力匹配路由"（基于 OrganizationContext），拍卖作为高级模式可启用。

### 问题 7：对齐护栏的边界不清

**原方案 KTD3**：AlignmentGuard 包含全局约束注入、输出审计、级联失败检测。

**问题**：
1. "全局约束"由谁定义？用户？开发者？运维？——需要约束配置机制
2. "输出审计"用 LLM 检查输出——这本身又是一次 LLM 调用，增加成本和延迟
3. "级联失败检测"的阈值（10 次交互、3 层循环）是经验值，需要可配置
4. 对齐护栏与现有 QualityGate 的关系不清——是替代还是补充？

**修正**：AlignmentGuard 明确为 QualityGate 的扩展，约束来源为 YAML 配置，级联检测阈值可配置，LLM 审计默认关闭（仅高风险场景启用）。

---

## 修订后的需求

| ID | 需求 | 变更说明 |
|----|------|---------|
| R1 | 用户通过现有 Chat 系统对话，路由层自动选择 Agent | 从"新建 Concierge"改为"扩展现有 Chat" |
| R2 | 简单任务走单 Agent 直连，零额外开销 | 不变 |
| R3 | 中等任务走能力匹配路由，可选拍卖模式 | 从"必须拍卖"改为"默认能力匹配，拍卖可选" |
| R4 | 复杂任务支持多 Agent 协作，需成本论证 | 不变 |
| R5 | 多 Agent 协作时注入全局约束 | 不变，约束来源明确为 YAML 配置 |
| R6 | 检测级联失败，自动中断 | 不变，阈值可配置 |
| R7 | 不同 Agent 可配置不同 LLM 模型 | 不变，已有 llm.model 配置支持 |
| R8 | 支持 ReAct/ReWOO/Plan-and-Execute/Reflexion/Direct 五种执行架构 | Plan-and-Execute 改为适配器模式 |
| R9 | Agent 具备持久身份（Soul），跨会话保持个性 | 从"新建 identity 模块"改为"扩展 MemoryTool" |
| R10 | Agent 具备组织感知 | 不变 |
| R11 | Agent 可主动发现新工具 | 降级为 Out of Scope（依赖 Marketplace 先就绪） |
| R12 | 执行透明度可调 | 不变 |

---

## 修订后的 Key Technical Decisions

### KTD1（修订）: 扩展现有 Chat 系统，不新建 Concierge

**决策**：在现有 `chat/skill_routing.py` + `server/routes/chat.py` 基础上扩展 CostAwareRouter 能力，不新建 Concierge 模块。

**理由**：
- `chat/skill_routing.py` 已实现 @skill: 前缀路由和 Skill 匹配
- `server/routes/chat.py` 已实现 REST + WebSocket Chat API
- `session/store.py` 已实现对话上下文管理
- 新建 Concierge 会与现有 Chat 系统功能重复，增加维护成本

### KTD2（不变）: 分层路由 — 80% 场景单 Agent 直连

### KTD3（修订）: AlignmentGuard 作为 QualityGate 扩展

**决策**：AlignmentGuard 不作为独立模块，而是扩展现有 QualityGate，增加约束注入和级联检测能力。

**理由**：
- QualityGate 已在 ConfigDrivenAgent.execute() 中集成
- 独立模块需要额外的集成点，增加复杂度
- 约束来源明确为 YAML 配置（alignment.constraints 字段）

### KTD4（修订）: Plan-and-Execute 使用适配器模式

**决策**：不新建 Plan-and-Execute 引擎，而是创建适配器将现有 GoalPlanner + PlanExecutor + PipelineReplanner 封装为 `plan_exec` execution_mode。

**理由**：
- GoalPlanner（594 行）已实现目标分解
- PlanExecutor（518 行）已实现计划执行
- PipelineReplanner（370 行）已实现反思-重规划
- 重新实现是重复建设

### KTD5（不变）: 分层模型配置

### KTD6（修订）: Soul 扩展基于现有 MemoryTool

**决策**：不新建 identity/ 模块，而是扩展现有 MemoryTool 的 SOUL section 支持动态演变。

**理由**：
- MemoryTool 已实现 SOUL section CRUD
- MemoryProfile 已实现记忆注入
- 新建 identity/ 模块与现有 Memory 系统重复

### KTD7（新增）: 拍卖机制降级为可选实验特性

**决策**：默认使用"能力匹配路由"（基于 OrganizationContext），拍卖作为可选高级模式。

**理由**：
- Economy of Minds 论文场景（弱 Agent 群体）与 AgentKit 场景（强 Agent + 工具）不同
- 拍卖的"财富积累"在单用户场景下意义不大
- 基于能力的路由更直接、更可预测
- 拍卖增加系统复杂度，收益不确定

---

## 修订后的 Implementation Units

### U1. CostAwareRouter — 扩展现有 Chat 路由

**Goal**：在现有 `chat/skill_routing.py` 基础上增加 complexity 评估和分层路由能力。

**Dependencies**：无

**Files**：
- `src/agentkit/chat/skill_routing.py` (modify — 增加 complexity 评估和拍卖触发)
- `src/agentkit/chat/__init__.py` (modify — 导出新增类)
- `tests/unit/test_cost_aware_router.py` (create)

**Approach**：
- 在 `skill_routing.py` 中新增 `CostAwareRouter` 类
- Layer 0：复用现有 `parse_skill_prefix()` 和 `route_to_skill()`，新增聊天模式正则匹配
- Layer 1：新增 `quick_classify()` 方法，LLM 评估 complexity 0-1
- Layer 2：complexity > 0.7 触发能力匹配路由（默认）或拍卖（可选）
- 透明度控制：在 SkillRoutingResult 中新增 `transparency_level` 和 `execution_trace` 字段

**Patterns to follow**：`chat/skill_routing.py` SkillRoutingResult + `router/intent.py` IntentRouter

**Test scenarios**：
- 问候语 "你好" 命中 Layer 0 规则，零 token 开销
- "搜索XX" 命中现有 Skill 路由，零 token 开销
- "分析下这个数据" 走 Layer 1 LLM 分类
- "做市场调研+竞品分析" complexity > 0.7，走能力匹配路由
- 透明度从 SILENT 切换到 TRACE

**Verification**：三层路由正确分流，与现有 Chat 系统兼容

---

### U2. ReWOO 执行引擎

**Goal**：实现 ReWOO 执行引擎，一次性规划所有工具调用后批量执行。

**Dependencies**：无

**Files**：
- `src/agentkit/core/rewoo.py` (create)
- `tests/unit/test_rewoo_engine.py` (create)

**Approach**：
- Phase 1 Planning：LLM 生成完整工具调用计划（JSON 格式 steps 列表）
- Phase 2 Execution：按计划顺序执行工具调用（可并行执行无依赖步骤）
- Phase 3 Synthesis：LLM 综合所有工具结果生成最终输出
- 参考 ReActEngine 的接口设计（execute/execute_stream），保持 API 一致性
- 复用 LLMGateway、Tool、CancellationToken 等现有组件

**Patterns to follow**：`core/react.py` ReActEngine 接口模式

**Test scenarios**：
- 单步骤计划：规划 1 个工具调用，执行，综合
- 多步骤计划：规划 3 个工具调用，顺序执行，综合
- 工具调用失败时的错误处理
- 与 ReActEngine 接口兼容（可替换使用）

**Verification**：ReWOO 引擎能完成规划→执行→综合的完整流程

---

### U3. Plan-and-Execute 适配器

**Goal**：将现有 GoalPlanner + PlanExecutor + PipelineReplanner 封装为 `plan_exec` execution_mode 的执行引擎适配器。

**Dependencies**：无

**Files**：
- `src/agentkit/core/plan_exec_engine.py` (create)
- `tests/unit/test_plan_exec_engine.py` (create)

**Approach**：
- PlanExecEngine 作为适配器，内部组合 GoalPlanner + PlanExecutor + PipelineReplanner
- 实现 ReActEngine 兼容的 execute()/execute_stream() 接口
- Planner 阶段：调用 GoalPlanner.generate_plan() 分解任务
- Executor 阶段：调用 PlanExecutor.execute_plan() 逐步执行
- Replanner 阶段：执行偏离时调用 PipelineReplanner.replan() 重规划
- 每个子步骤可选择不同执行策略（react/direct/rewoo）

**Patterns to follow**：`core/react.py` ReActEngine 接口 + `core/goal_planner.py` + `core/plan_executor.py`

**Test scenarios**：
- 3 步骤任务：规划 → 逐步执行 → 汇总
- 执行偏离时触发重规划（PipelineReplanner）
- 子步骤使用不同执行策略
- 与 ReActEngine 接口兼容

**Verification**：PlanExecEngine 能完成规划→执行→重规划的完整流程，复用现有组件

---

### U4. Reflexion 执行引擎

**Goal**：在 ReActEngine 基础上增加 Evaluate→Reflect→Retry 循环。

**Dependencies**：无

**Files**：
- `src/agentkit/core/reflexion.py` (create)
- `tests/unit/test_reflexion_engine.py` (create)

**Approach**：
- 继承/组合 ReActEngine，在 ReAct 循环结束后增加评估步骤
- Evaluate：LLM 评估当前结果质量（0-1 分），复用 LLMReflector 的评估逻辑
- Reflect：评估分低于阈值时，LLM 反思失败原因，复用 evolution/reflector.py
- Retry：基于反思结果重新执行 ReAct 循环
- 最多重试 max_reflections 次（默认 3 次）
- 分层模型：act 用中模型，evaluate/reflect 用大模型

**Patterns to follow**：`core/react.py` ReActEngine + `evolution/llm_reflector.py` LLMReflector

**Test scenarios**：
- 首次执行即达标，不触发重试
- 评估分低于阈值触发反思+重试
- 重试后达标，返回最终结果
- 超过 max_reflections 次重试后返回最后结果
- 分层模型验证

**Verification**：Reflexion 引擎能完成执行→评估→反思→重试的完整循环

---

### U5. SkillConfig 扩展 + 专业 Agent 定义

**Goal**：扩展 SkillConfig 支持新执行模式，定义五种专业 Agent 的 YAML 配置。

**Dependencies**：U2, U3, U4

**Files**：
- `src/agentkit/skills/base.py` (modify — VALID_EXECUTION_MODES 扩展)
- `src/agentkit/core/config_driven.py` (modify — handle_task 路由扩展)
- `configs/skills/react_agent.yaml` (create)
- `configs/skills/rewoo_agent.yaml` (create)
- `configs/skills/plan_exec_agent.yaml` (create)
- `configs/skills/reflexion_agent.yaml` (create)
- `configs/skills/direct_agent.yaml` (create)
- `tests/unit/test_execution_modes.py` (create)

**Approach**：
- SkillConfig.VALID_EXECUTION_MODES 新增 "rewoo", "plan_exec", "reflexion"
- ConfigDrivenAgent.handle_task() 新增 _handle_rewoo/_handle_plan_exec/_handle_reflexion 路由
- 每种专业 Agent 的 YAML 配置指定不同的 llm.model
- 复用现有 SkillLoader 和 SkillRegistry 的加载逻辑

**Patterns to follow**：`skills/base.py` SkillConfig + `skills/loader.py` SkillLoader

**Test scenarios**：
- SkillConfig 验证 "rewoo"/"plan_exec"/"reflexion" 为合法 execution_mode
- ConfigDrivenAgent 根据 execution_mode 路由到正确引擎
- 五种专业 Agent YAML 配置加载成功
- 不同 Agent 配置不同 llm.model

**Verification**：五种执行模式均可通过配置启用，路由正确

---

### U6. OrganizationContext 组织感知

**Goal**：实现组织上下文，Agent 知道可以向谁求助，支持基于能力的 Agent 发现。

**Dependencies**：U5

**Files**：
- `src/agentkit/org/__init__.py` (create)
- `src/agentkit/org/context.py` (create)
- `src/agentkit/org/discovery.py` (create)
- `tests/unit/test_org_context.py` (create)

**Approach**：
- AgentProfile：name, agent_type, capabilities, skills, current_load, max_concurrency, availability, specializations
- OrganizationContext：agents dict, capability_matrix（能力→Agent 映射）, find_best_agent() 方法
- AgentDiscovery：基于能力的 Agent 发现，考虑负载均衡
- 与现有 AgentPool 集成：从 AgentPool 自动构建 OrganizationContext
- 与现有 SkillRegistry 集成：从 SkillConfig.capabilities 构建能力矩阵
- 注入到 BaseAgent.on_task_start()：Agent 启动时自动获得组织上下文

**Patterns to follow**：`core/agent_pool.py` AgentPool + `skills/schema.py` CapabilityTag

**Test scenarios**：
- 根据 required_capabilities 找到匹配的 Agent
- 负载均衡：选择当前负载最低的 Agent
- 无匹配 Agent 时返回 None
- OrganizationContext 从 AgentPool + SkillRegistry 自动构建

**Verification**：Agent 能通过 OrganizationContext 发现合适的协作 Agent

---

### U7. AlignmentGuard — QualityGate 扩展

**Goal**：扩展现有 QualityGate，增加全局约束注入和级联失败检测能力。

**Dependencies**：U6

**Files**：
- `src/agentkit/quality/alignment.py` (create)
- `src/agentkit/quality/cascade_detector.py` (create)
- `src/agentkit/skills/base.py` (modify — 新增 AlignmentConfig)
- `tests/unit/test_alignment_guard.py` (create)

**Approach**：
- AlignmentConfig：constraints（全局约束列表）、cascade_threshold（级联检测阈值）、audit_enabled（LLM 审计开关，默认关闭）
- ConstraintInjector：在任务分发前注入全局约束到每个子任务的 input_data
- CascadeDetector：检测 Agent 间交互次数超限和循环深度超限，触发中断
- LLM 审计默认关闭，仅高风险场景（标记 alignment.audit_enabled: true）启用
- 与现有 QualityGate 集成：在 QualityGate.validate() 之后执行对齐检查

**Patterns to follow**：`quality/gate.py` QualityGate + `skills/base.py` QualityGateConfig

**Test scenarios**：
- 全局约束被注入到子任务
- 级联检测：Agent 间交互超过阈值触发中断
- LLM 审计关闭时无额外 LLM 调用
- LLM 审计开启时检查输出是否违反约束

**Verification**：对齐护栏能检测约束违反和级联失败，与 QualityGate 兼容

---

### U8. Soul 动态演变 — 扩展 MemoryTool

**Goal**：扩展现有 MemoryTool 的 SOUL section 支持动态演变（版本号+反思触发更新）。

**Dependencies**：U5

**Files**：
- `src/agentkit/tools/memory_tool.py` (modify — SOUL section 增加版本号和更新逻辑)
- `src/agentkit/evolution/lifecycle.py` (modify — 反思结果触发 Soul 更新)
- `tests/unit/test_soul_evolution.py` (create)

**Approach**：
- SOUL section 新增 `version` 和 `updated_at` 字段
- MemoryTool 新增 `update_soul()` 方法：基于反思结果更新 Soul
- EvolutionMixin 新增 `evolve_soul()` 钩子：反思完成后检查是否需要更新 Soul
- Soul 更新条件：反思发现新的行为模式/偏好/能力变化
- Soul 注入：复用现有 MemoryProfile 的 SOUL section 注入逻辑

**Patterns to follow**：`tools/memory_tool.py` MemoryTool + `evolution/lifecycle.py` EvolutionMixin

**Test scenarios**：
- Soul 版本号初始为 1，更新后递增
- 反思结果触发 Soul 更新（新增 strength/value）
- 无反思结果时不触发更新
- Soul 信息正确注入到 System Prompt（复用现有逻辑）

**Verification**：Agent 具备跨会话的持久身份，Soul 可动态演变

---

### U9. 拍卖机制（可选实验特性）

**Goal**：实现拍卖机制作为可选的高级路由模式，默认不启用。

**Dependencies**：U6

**Files**：
- `src/agentkit/marketplace/__init__.py` (create)
- `src/agentkit/marketplace/auction.py` (create)
- `src/agentkit/marketplace/wealth.py` (create)
- `tests/unit/test_auction.py` (create)

**Approach**：
- Bid 数据结构：agent_name, architecture, estimated_steps, estimated_cost, confidence, payment_offer
- 拍卖裁决：score = (confidence / estimated_cost) * wealth_factor
- 财富追踪：成功完成任务增加财富，长期表现差被标记破产
- 默认关闭，需在配置中显式启用 `marketplace.auction_enabled: true`
- 启用后，Layer 2 路由使用拍卖而非能力匹配

**Patterns to follow**：`core/agent_pool.py` AgentPool

**Test scenarios**：
- 拍卖关闭时使用能力匹配路由
- 拍卖启用后，多 Agent 竞标选择最优
- 财富因子影响竞标结果
- Agent 破产检查

**Verification**：拍卖机制作为可选特性正确工作，不影响默认路由

---

### U10. 集成测试 + Server 集成

**Goal**：将所有新模块集成到现有 Server 中，实现端到端的 Chat → Router → Agent → AlignmentGuard 完整流程。

**Dependencies**：U1-U9

**Files**：
- `src/agentkit/server/app.py` (modify — 注入 OrganizationContext、AlignmentGuard)
- `src/agentkit/server/config.py` (modify — 新增 marketplace/alignment 配置段)
- `src/agentkit/chat/skill_routing.py` (modify — 集成 CostAwareRouter)
- `tests/integration/test_marketplace_e2e.py` (create)

**Approach**：
- create_app() 中新增 OrganizationContext、AlignmentGuard 的初始化
- CostAwareRouter 集成到现有 Chat 路由流程
- ServerConfig 新增 marketplace 和 alignment 配置段
- 端到端测试：用户消息 → Chat → Router → Agent → AlignmentGuard → 回复

**Patterns to follow**：`server/app.py` create_app() 组装模式

**Test scenarios**：
- 简单聊天经路由到 DirectAgent，返回正常
- 复杂任务经能力匹配路由选择 Agent，执行完成返回
- 对齐护栏检测到级联风险，触发中断
- 透明度 TRACE 模式返回执行追踪信息
- 拍卖模式启用后，复杂任务走拍卖路由

**Verification**：端到端流程完整可用，与现有 Chat 系统兼容

---

## 修订后的 Phased Delivery

### Phase A — 执行引擎（U2, U3, U4, U5）
三种新引擎 + SkillConfig 扩展，可独立运行，不依赖 Marketplace

### Phase B — 路由与组织（U1, U6, U7）
CostAwareRouter + OrganizationContext + AlignmentGuard

### Phase C — 身份与集成（U8, U9, U10）
Soul 演变 + 拍卖（可选）+ Server 集成

---

## 修订后的 Risks & Dependencies

| 风险 | 影响 | 缓解措施 |
|------|------|---------|
| PlanExecEngine 适配器与现有组件接口不兼容 | Plan-and-Execute 模式无法工作 | 适配器内部处理接口差异，对外暴露 ReActEngine 兼容接口 |
| Reflexion 引擎 token 成本高 | 自我评估+重试增加 2-3x token | 分层模型 + max_reflections 限制 + 默认关闭 |
| CostAwareRouter Layer 1 分类不准 | 中等任务被错误路由 | 分类结果带置信度，低置信度时回退到默认 Agent |
| AlignmentGuard 级联检测误报 | 正常多步交互被中断 | 阈值可配置，初期宽松 |
| 拍卖机制增加系统复杂度 | 维护成本高 | 默认关闭，作为可选实验特性 |
| 与 P2 Hardening 计划冲突 | 两个计划同时修改 server/app.py | P2 先行，Marketplace 后续，避免同时修改同一文件 |

---

## 已明确事项

### 1. 拍卖机制 — 作为核心特性

**决策**：拍卖机制是核心差异化能力，应在 Phase B 与能力匹配路由同时实现。

**实现要点**：
- 需要解决"奖励信号来源"问题：任务成功 → 正奖励，任务失败 → 负奖励，由 Concierge/Router 在任务完成后发放
- Agent 需要新增 `bid()` 方法（在 BaseAgent 中定义默认实现，ConfigDrivenAgent 覆盖）
- 拍卖与能力匹配路由并行：能力匹配作为底保，拍卖作为优选

### 2. AlignmentGuard 约束检查 — 分层混合

**决策**：系统级用规则检查，组织级用 LLM 检查，用户级用 Prompt 注入。

| 层级 | 检查方式 | 定义者 | 示例 |
|------|---------|--------|------|
| 系统级 | 规则检查（关键词+正则） | 开发者/运维 | "不生成恶意代码"、"不泄露 API Key" |
| 组织级 | LLM 语义检查 | 管理员 | "不引用竞品数据"、"合规审查需人工" |
| 用户级 | Prompt 注入（不检查） | 用户 | "用中文回复"、"不超过 500 字" |

**实现要点**：
- 系统级约束硬编码在 `quality/alignment.py` 中，配置可扩展
- 组织级约束在 `agentkit.yaml` 的 `alignment.constraints` 中配置
- LLM 审计仅组织级约束触发，系统级约束用规则检查零额外 token

### 3. Soul 更新频率 — 条件触发

**决策**：同类反思出现 ≥ 3 次才触发 Soul 更新，更新后版本号递增，可回滚。

**实现要点**：
- EvolutionMixin 维护 `pending_soul_updates: dict[str, list[Reflection]]` 缓冲区
- 同类反思（相同 category）累积 ≥ 3 次时触发 `update_soul()`
- Soul 更新记录完整变更历史（before/after/trigger/evidence），支持回滚
- Soul 版本号递增，每次更新 +1

### 4. 专业 Agent 工具集 — YAML 配置 + 默认推荐

**决策**：工具通过 YAML 配置绑定，提供默认推荐配置，用户可自定义。

**默认推荐**：
| Agent | 默认工具 | 原因 |
|-------|---------|------|
| ReactAgent | web_search, baidu_search, shell, memory | ReAct 需要丰富工具集 |
| RewooAgent | web_search, baidu_search, web_crawl | 批量数据采集类工具 |
| PlanExecAgent | 所有工具（子步骤按需选择） | 子步骤可能需要任何工具 |
| ReflexionAgent | 与 ReactAgent 相同 | Reflexion = ReAct + 评估 |
| DirectAgent | 无工具 | 单次 LLM 调用 |

### 5. 与 P2 Hardening 计划 — 部分并行

**决策**：Phase A（执行引擎）与 P2 并行开发，Phase B/C 等 P2 完成后再开始。

**理由**：
- Phase A 只新增引擎文件（rewoo.py/plan_exec_engine.py/reflexion.py），不修改 server 文件，无冲突
- Phase B/C 需要修改 server/app.py、server/config.py 等，与 P2 有文件冲突
- P2 修复安全问题，不应被阻塞

### 6. 分层模型配置 — YAML 配置 + 默认推荐

**决策**：模型通过 YAML 的 `llm.model` 字段配置，提供默认推荐值。

**默认推荐**：
| Agent | 默认模型 | 预估成本/1K tokens |
|-------|---------|-------------------|
| DirectAgent | `openai/gpt-4o-mini` | $0.00015 |
| ReactAgent | `anthropic/claude-sonnet-4-20250514` | $0.003 |
| RewooAgent | `anthropic/claude-sonnet-4-20250514` | $0.003 |
| PlanExecAgent | `anthropic/claude-opus-4-20250514` | $0.015 |
| ReflexionAgent | 执行: `sonnet`, 评估: `opus` | 混合 |

### 7. 多 Agent 协作上下文传递 — 按需升级

**决策**：默认用直接注入（TaskMessage.input_data），复杂场景按需升级到 SharedWorkspace 或 Redis Pub/Sub。

| 场景 | 传递方式 | 原因 |
|------|---------|------|
| 顺序执行（A→B） | 直接注入 | 简单直接 |
| 并行执行（A+B→C） | SharedWorkspace | A/B 并行写入，C 汇总读取 |
| 事件通知（A 通知 B） | Redis Pub/Sub | 异步解耦 |
| 对话连续性 | SessionManager + 摘要 | 跨 Agent 连续 |