---
title: "feat: Pipeline 级别对抗闭环（Coding Harness）"
status: active
created: 2026-06-12
origin: 头脑风暴对话 - Worker ↔ Verifier 对抗闭环改进方案
type: feat
---

# feat: Pipeline 级别对抗闭环（Coding Harness）

## 问题框架

当前 Pipeline Engine 的 `retry_count` 和 `retry_policy` 实现的是**盲目重试**（指数退避重跑相同逻辑），`QualityGate` 是**单向验证**（validate → pass/fail）。Worker 产出失败后不知道具体哪里有问题，重试时无法定向修复。

**目标：** 实现 Worker → Verifier → 带反馈打回 Worker → 定向修复 → 再次审查的对抗闭环，通过 Pipeline YAML 配置即可启用。

---

## 高层技术设计

### 对抗流转状态机

```mermaid
stateDiagram-v2
    [*] --> Worker: 执行 Stage
    Worker --> Verifier: 产出完成
    Verifier --> [*]: 审查通过 (passed=true)
    Verifier --> Worker: 审查不通过 (round < max)
    Worker --> Verifier: 根据反馈修复
    Verifier --> Escalate: 轮次耗尽 (round >= max)
    Escalate --> [*]: 转人工或标记失败
```

### 反馈数据结构

```
ReviewFeedback
├── passed: bool
├── score: float (0-1)
├── summary: str (自然语言审查报告)
└── issues: list[ReviewIssue]
    ├── severity: critical/major/minor
    ├── category: logic_error/security/style/test_failure/architecture
    ├── description: str
    ├── location: str? (文件路径/行号)
    └── suggestion: str?
```

### 配置扩展

在 `PipelineStage` 新增 4 个字段：
- `verifier`: str | None — Verifier Agent 名称
- `max_adversarial_rounds`: int — 最大对抗轮次（默认 3）
- `feedback_mode`: str — 反馈模式（structured+natural / structured / natural）
- `escalate_on_exhaust`: str | None — 轮次耗尽后的升级目标

---

## 实施单元

### U1. 扩展 Pipeline Schema — 对抗字段和反馈数据模型

**Goal:** 在 `pipeline_schema.py` 中新增对抗闭环所需的数据模型和字段

**Requirements:** 
- PipelineStage 支持配置 Verifier 和对抗参数
- 提供结构化的 ReviewFeedback 和 ReviewIssue 数据模型
- 提供 AdversarialState 用于追踪对抗轮次

**Dependencies:** 无

**Files:**
- `src/agentkit/orchestrator/pipeline_schema.py` (修改)
- `tests/unit/test_pipeline_schema.py` (修改)

**Approach:**
1. 新增 `ReviewIssue` Pydantic 模型（severity, category, description, location, suggestion）
2. 新增 `ReviewFeedback` Pydantic 模型（passed, issues, summary, score）
3. 新增 `AdversarialState` Pydantic 模型（current_round, max_rounds, feedback_history, last_feedback）
4. 在 `PipelineStage` 新增 4 个字段：
   - `verifier: str | None = None`
   - `max_adversarial_rounds: int = 3`
   - `feedback_mode: str = "structured+natural"`
   - `escalate_on_exhaust: str | None = None`

**Test scenarios:**
1. **Happy path:** 创建带 verifier 字段的 PipelineStage，验证字段默认值正确
2. **Happy path:** 创建 ReviewFeedback 对象，验证序列化和反序列化正常
3. **Edge case:** verifier=None 时，PipelineStage 正常创建（向后兼容）
4. **Edge case:** max_adversarial_rounds=0 时正常创建

---

### U2. Pipeline Engine 增强 — 对抗流转执行逻辑

**Goal:** 在 `PipelineEngine` 中实现 `_execute_stage_with_adversarial` 方法，处理 Worker ↔ Verifier 对抗循环

**Requirements:**
- 当 Stage 配置了 verifier 时自动进入对抗模式
- Verifier 审查不通过时，带反馈打回 Worker 重做
- 超过最大轮次后执行升级处理
- 保持与现有 `_execute_stage` 的向后兼容

**Dependencies:** U1

**Files:**
- `src/agentkit/orchestrator/pipeline_engine.py` (修改)
- `tests/unit/test_pipeline_adversarial.py` (新增)

**Approach:**

1. **新增 `_execute_stage_with_adversarial` 方法：**
   - 检查 `stage.verifier` 是否存在，不存在则走原有逻辑
   - 初始化 `AdversarialState`
   - 进入对抗循环（1 到 max_adversarial_rounds）：
     - 执行 Worker Agent
     - 执行 Verifier 审查 Worker 产出
     - 如果通过：返回成功结果
     - 如果不通过：
       - 记录反馈到 feedback_history
       - 如果轮次耗尽：调用 `_escalate` 处理
       - 否则：调用 `_execute_agent_with_feedback` 打回 Worker

2. **新增 `_execute_agent_with_feedback` 方法：**
   - 构建反馈上下文（previous_attempt_failed, review_feedback, instruction）
   - 合并到原有上下文
   - 调用 Dispatcher 执行 Agent

3. **新增 `_execute_verifier` 方法：**
   - 调用 Verifier Agent 执行审查
   - 解析返回结果为 ReviewFeedback 对象
   - 记录审查日志

4. **新增 `_escalate` 方法：**
   - 如果配置了 `escalate_on_exhaust`：转发到升级目标（如 human_approval）
   - 否则：返回失败结果，附带审查历史

5. **修改 `_execute_stage` 方法：**
   - 检查是否配置了 verifier
   - 如果配置了，路由到 `_execute_stage_with_adversarial`
   - 否则保持原有逻辑

**Test scenarios:**
1. **Happy path:** Stage 无 verifier → 走原有逻辑，正常完成
2. **Happy path:** Stage 有 verifier，审查通过 → 一次完成
3. **Happy path:** Stage 有 verifier，审查不通过 → 打回 Worker → 修复后通过
4. **Edge case:** 超过 max_adversarial_rounds → 触发 escalate_on_exhaust
5. **Edge case:** escalate_on_exhaust=None → 返回失败，附带审查历史
6. **Error path:** Verifier 执行异常 → 记录错误，返回失败
7. **Error path:** Worker 重试时再次异常 → 继续下一轮或轮次耗尽
8. **Integration:** 完整对抗流程的状态追踪正确（feedback_history 长度=实际轮次）

---

### U3. 反馈上下文构建和注入机制

**Goal:** 实现结构化的反馈上下文构建，让 Worker Agent 能理解审查反馈并定向修复

**Requirements:**
- 反馈上下文包含结构化问题列表和自然语言审查报告
- Worker 能根据反馈上下文调整生成策略
- 支持 feedback_mode 配置（structured+natural / structured / natural）

**Dependencies:** U2

**Files:**
- `src/agentkit/orchestrator/pipeline_engine.py` (修改，续 U2)
- `tests/unit/test_pipeline_adversarial.py` (修改，续 U2)

**Approach:**

1. **构建反馈上下文字典：**
```python
feedback_context = {
    "previous_attempt_failed": True,
    "review_feedback": {
        "summary": feedback.summary,
        "issues": [
            {
                "severity": issue.severity,
                "category": issue.category,
                "description": issue.description,
                "suggestion": issue.suggestion,
            }
            for issue in feedback.issues
        ],
        "previous_score": feedback.score,
    },
    "instruction": (
        "Your previous output did not pass review. "
        "Please fix the issues listed above and regenerate."
    ),
}
```

2. **根据 feedback_mode 调整上下文：**
   - `structured+natural`: 包含完整 issues 列表和 summary
   - `structured`: 只包含 issues 列表
   - `natural`: 只包含 summary 和 instruction

3. **合并到原有上下文：**
   - `merged_context = {**context, **feedback_context}`
   - 传递给 Agent 执行

**Test scenarios:**
1. **Happy path:** feedback_mode="structured+natural" → 上下文包含 issues 和 summary
2. **Happy path:** feedback_mode="structured" → 上下文只包含 issues
3. **Happy path:** feedback_mode="natural" → 上下文只包含 summary
4. **Edge case:** feedback.feedback_history 有多轮记录 → 合并所有历史反馈

---

### U4. 创建 code_reviewer Skill 配置示例

**Goal:** 创建代码审查 Verifier Agent 的 Skill 配置，作为对抗模式的的标准 Verifier 模板

**Requirements:**
- 配置为 direct 执行模式
- system_prompt 定义严格的代码审查角色和检查维度
- 配置 output_schema 确保返回结构化的 ReviewFeedback 格式

**Dependencies:** U1（需要 ReviewFeedback schema 存在）

**Files:**
- `configs/skills/code_reviewer.yaml` (新增)

**Approach:**

1. 创建 `code_reviewer.yaml`：
   - name: code_reviewer
   - execution_mode: direct
   - intent_match: "code.*review|review.*code"
   - system_prompt: 定义代码审查角色、检查维度（逻辑正确性、安全漏洞、架构设计、测试覆盖、代码风格）
   - tools: [shell_tool]（用于运行测试用例）
   - quality_gate: 配置 required_fields 和 output_schema

2. output_schema 定义：
   - passed: boolean
   - issues: array of {severity, category, description, location?, suggestion?}
   - summary: string
   - score: number (0-1)

**Test expectation:** none — 这是配置文件，通过 YAML 加载测试验证格式正确性

---

### U5. 创建 coding_harness Pipeline 配置示例

**Goal:** 创建完整的编码对抗 Pipeline 配置示例，展示如何使用对抗闭环功能

**Requirements:**
- 包含 develop → test → review（对抗模式） → archive 四个阶段
- review 阶段配置 verifier、max_adversarial_rounds、escalate_on_exhaust
- 使用变量引用传递阶段间产出

**Dependencies:** U4

**Files:**
- `configs/pipelines/coding_harness.yaml` (新增)

**Approach:**

1. 创建 `coding_harness.yaml`：
   - name: coding_harness, version: "1.0"
   - 阶段 1 (develop): developer_agent 实现功能
   - 阶段 2 (test): tester_agent 运行测试，依赖 develop
   - 阶段 3 (review): developer_agent 修复问题，verifier=code_reviewer, max_adversarial_rounds=3, escalate_on_exhaust=human_approval
   - 阶段 4 (archive): archiver_agent 提交代码，依赖 review

2. 配置变量引用：
   - test 阶段输入: code="{{develop.code}}", test_files="{{develop.test_files}}"
   - review 阶段输入: code="{{develop.code}}", test_results="{{test.test_results}}"
   - archive 阶段输入: code="{{review.final_code}}"

**Test expectation:** none — 这是配置文件，通过 YAML 加载测试验证格式和引用正确性

---

### U6. 编写单元测试 — 对抗流转和反馈注入

**Goal:** 为对抗闭环功能编写完整的单元测试覆盖

**Requirements:**
- 覆盖 U1-U3 定义的所有测试场景
- 使用 mock 模拟 Dispatcher 和 Agent 执行
- 验证对抗流转逻辑正确性

**Dependencies:** U1, U2, U3

**Files:**
- `tests/unit/test_pipeline_adversarial.py` (新增)

**Approach:**

创建以下测试类：

1. **TestPipelineSchemaAdversarial:**
   - test_stage_with_verifier
   - test_stage_without_verifier_backward_compat
   - test_review_feedback_serialization
   - test_adversarial_state_tracking

2. **TestAdversarialExecution:**
   - test_no_verifier_passthrough
   - test_verifier_passes_first_round
   - test_verifier_fails_then_worker_fixes
   - test_max_rounds_exhausted_escalate
   - test_max_rounds_exhausted_no_escalate
   - test_verifier_execution_error
   - test_worker_retry_error

3. **TestFeedbackContext:**
   - test_structured_and_natural_mode
   - test_structured_only_mode
   - test_natural_only_mode
   - test_multiple_rounds_feedback_merge

4. **TestEscalation:**
   - test_escalate_to_human_approval
   - test_escalate_to_fallback_agent
   - test_no_escalation_configured

**Test scenarios:** 见各测试类定义

---

### U7. 编写集成测试 — 完整 Coding Harness Pipeline

**Goal:** 编写集成测试验证完整的 Coding Harness Pipeline 端到端流程

**Requirements:**
- 加载 coding_harness.yaml 配置
- 模拟完整的 develop → test → review → archive 流程
- 验证对抗闭环在 review 阶段正常工作

**Dependencies:** U4, U5, U6

**Files:**
- `tests/integration/test_coding_harness_pipeline.py` (新增)

**Approach:**

1. 创建集成测试：
   - 使用 MockDispatcher 模拟 Agent 执行
   - 模拟 develop 阶段产出代码
   - 模拟 test 阶段运行测试
   - 模拟 review 阶段：第一次审查不通过 → 打回修复 → 第二次审查通过
   - 模拟 archive 阶段提交代码

2. 验证点：
   - Pipeline 最终状态为 COMPLETED
   - review 阶段经历了 2 轮对抗
   - feedback_history 记录了审查反馈
   - 各阶段输出变量正确传递

**Test scenarios:**
1. **Happy path:** 完整 Pipeline 执行，review 阶段 2 轮对抗后通过
2. **Edge case:** review 阶段 3 轮对抗后仍不通过 → escalate 到 human_approval
3. **Error path:** test 阶段失败 → Pipeline 中止，不进入 review

---

## 范围边界

**包含：**
- Pipeline Schema 扩展（对抗字段和反馈数据模型）
- Pipeline Engine 对抗流转执行逻辑
- 反馈上下文构建和注入
- code_reviewer Skill 配置示例
- coding_harness Pipeline 配置示例
- 单元测试和集成测试

**不包含（延期到后续工作）：**
- 任务复杂度评估器（自动判断是否启用对抗团队）
- IM 异步秒级响应（Leader 立即回执 + 后台异步调度）
- 多路并行调研对抗（多路 Researcher + 独立 Verifier）
- 对抗成本监控（Token 消耗、时间、修复成功率记录）
- Verifier 多角色拆分（LogicReviewer / SecurityReviewer / StyleReviewer 并行审查）

---

## 风险和依赖

### 风险

1. **Agent 反馈理解能力：** Worker Agent 可能无法完全理解结构化反馈并定向修复。缓解措施：使用 feedback_mode="structured+natural" 提供自然语言说明。

2. **Verifier 审查质量：** code_reviewer 的审查质量取决于 system_prompt 和 LLM 能力。缓解措施：提供高质量的 system_prompt 模板，支持后续优化。

3. **Token 消耗：** 多轮对抗可能消耗大量 Token。缓解措施：max_adversarial_rounds 默认 3，可配置。

### 依赖

- 现有 Pipeline Engine 基础设施（DAG 拓扑排序、并行执行、变量解析）
- 现有 Dispatcher 接口（dispatch、get_task_status）
- 现有 Agent 配置系统（ConfigDrivenAgent、SkillConfig）

---

## 系统级影响

- **向后兼容：** PipelineStage 新增字段都有默认值，现有 Pipeline 配置无需修改
- **性能影响：** 无 verifier 配置的 Stage 走原有逻辑，无性能影响；有 verifier 的 Stage 可能增加执行时间（多轮对抗）
- **可观测性：** 对抗轮次和审查结果记录在 StageResult 的 output_data 中，可通过日志和状态管理查询