Merge branch 'test/full-regression-real-llm-e2e' into main
Deploy to Production / deploy (push) Failing after 10s
Details
Deploy to Production / deploy (push) Failing after 10s
Details
合并全面回测 + 真实 LLM E2E + 路由优化 + 代码审查修复到主干。 主要变更: - U1-U6: 6 个修复单元(benchmark 超时、LLM 超时、QualityGate、disambiguation_keywords、路由正则、重新基准测试) - ce-code-review: 5 项安全修复 - Benchmark 准确率:60% -> 93.3% - 40 项单元测试全通过
This commit is contained in:
commit
44bc27c9b3
|
|
@ -21,6 +21,11 @@ venv/
|
|||
.coverage
|
||||
htmlcov/
|
||||
|
||||
# Playwright E2E (scoped to frontend dir to avoid ignoring project-level test-results/)
|
||||
src/agentkit/server/frontend/playwright-report/
|
||||
src/agentkit/server/frontend/test-results/
|
||||
src/agentkit/server/frontend/blob-report/
|
||||
|
||||
# OS
|
||||
.DS_Store
|
||||
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ intent:
|
|||
- "帮我看看代码有没有问题"
|
||||
- "代码审查一下"
|
||||
- "review一下这段代码"
|
||||
disambiguation_keywords: ["代码质量", "bug检查", "安全漏洞", "逻辑检查"]
|
||||
|
||||
capabilities:
|
||||
- code_review
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ intent:
|
|||
- "对手怎么样"
|
||||
- "竞品啥情况"
|
||||
- "How are competitors doing"
|
||||
disambiguation_keywords: ["竞品分析", "竞争对比", "市场对手", "品牌差距"]
|
||||
|
||||
input_schema:
|
||||
type: object
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ intent:
|
|||
- "帮我写点东西"
|
||||
- "写篇文章吧"
|
||||
- "Write something for me"
|
||||
disambiguation_keywords: ["内容创作", "文章生成", "选题写作", "原创内容"]
|
||||
|
||||
input_schema:
|
||||
type: object
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ intent:
|
|||
- "提升文章在AI搜索中的排名"
|
||||
- "做个SEO优化"
|
||||
- "Optimize for AI search"
|
||||
disambiguation_keywords: ["搜索排名", "AI搜索引擎", "内容可见性", "引用率提升"]
|
||||
|
||||
input_schema:
|
||||
type: object
|
||||
|
|
|
|||
|
|
@ -16,6 +16,7 @@ intent:
|
|||
- "分析竞品 SEO 策略并生成优化方案"
|
||||
- "调研3个技术方案并生成对比报告"
|
||||
- "制定市场推广计划并执行"
|
||||
disambiguation_keywords: ["目标分解", "多步规划", "方案对比", "执行计划"]
|
||||
|
||||
input_schema:
|
||||
type: object
|
||||
|
|
|
|||
|
|
@ -14,6 +14,7 @@ intent:
|
|||
- "搜索一下AI Agent市场数据"
|
||||
- "帮我分析这个数据"
|
||||
- "实时监控竞品动态"
|
||||
disambiguation_keywords: ["实时搜索", "工具调用", "信息查询", "动态适应"]
|
||||
|
||||
capabilities:
|
||||
- dynamic_adaptation
|
||||
|
|
|
|||
|
|
@ -14,6 +14,7 @@ intent:
|
|||
- "审查这段代码的合规性"
|
||||
- "生成一个高精度的数据分析脚本"
|
||||
- "检查报告中的合规问题"
|
||||
disambiguation_keywords: ["反思", "自我验证", "迭代优化", "高精度"]
|
||||
|
||||
capabilities:
|
||||
- self_evaluation
|
||||
|
|
|
|||
|
|
@ -18,6 +18,7 @@ intent:
|
|||
- "采集A、B、C三个竞品的功能数据"
|
||||
- "批量获取多个知识库的信息"
|
||||
- "并行搜索多个关键词"
|
||||
disambiguation_keywords: ["并行采集", "批量获取", "多源数据", "无依赖调用"]
|
||||
|
||||
capabilities:
|
||||
- batch_execution
|
||||
|
|
|
|||
|
|
@ -0,0 +1,320 @@
|
|||
---
|
||||
title: "fix: 回测问题修复 + 路由优化 + 质量门控强化"
|
||||
status: completed
|
||||
created: 2026-06-20
|
||||
type: fix
|
||||
origin: test/full-regression-real-llm-e2e 回测结果
|
||||
---
|
||||
|
||||
# fix: 回测问题修复 + 路由优化 + 质量门控强化
|
||||
|
||||
## Summary
|
||||
|
||||
修复全面回测中发现的 5 个代码问题,优化当前 RequestPreprocessor 路由准确率,强化 QualityGate 质量门控,并重新基准测试建立当前架构基线。
|
||||
|
||||
## Problem Frame
|
||||
|
||||
回测发现以下问题(基于 `test/full-regression-real-llm-e2e` 分支):
|
||||
|
||||
1. **Benchmark 超时过短** — `llm-001`(easy 难度)超时阈值 20s,真实 LLM(qwen3.7-plus)无法在 20s 内完成工具调用推理,导致 2/5 用例超时
|
||||
2. **LLM Provider httpx 超时硬编码** — `OpenAICompatibleProvider` 的 httpx 客户端硬编码 `timeout=60.0`,忽略 `ProviderConfig.timeout`(120s)
|
||||
3. **QualityGate skill_match 休眠** — `_check_skill_match()` 方法存在但无调用方传入 `skill_context`,质量门控形同虚设
|
||||
4. **QualityGate 自定义验证器过于宽松** — 验证器导入/执行失败时静默跳过(`passed=True`),不拦截低质量输出
|
||||
5. **16 个技能配置均无 disambiguation_keywords** — 易混淆技能对(reflexion_agent↔code_reviewer 等)无法消歧
|
||||
6. **路由优化** — 当前 RequestPreprocessor 仅 3 条正则(问候/闲聊/身份),大量简单 factual 问题被送入 REACT 循环,浪费 token
|
||||
|
||||
## Requirements
|
||||
|
||||
- R1: Benchmark easy 难度超时从 20s 提升至 45s,medium 从 40s 提升至 60s
|
||||
- R2: OpenAICompatibleProvider httpx 客户端使用 ProviderConfig.timeout 而非硬编码 60s
|
||||
- R3: QualityGate skill_match 在执行管线中被实际调用(传入 skill_context)
|
||||
- R4: QualityGate 自定义验证器失败时支持严格模式(可配置拦截 vs 警告)
|
||||
- R5: 为 4 对易混淆技能添加 disambiguation_keywords 字段
|
||||
- R6: RequestPreprocessor 新增 factual/数学/翻译类正则,减少不必要的 REACT 调用
|
||||
- R7: 修复后重新运行 benchmark 建立当前架构基线
|
||||
|
||||
## Key Technical Decisions
|
||||
|
||||
### KTD1: Benchmark 超时按难度分级保留,但提升阈值
|
||||
|
||||
**决策**: 保留 `_LLM_TIMEOUT_BY_DIFFICULTY` 字典结构,提升 easy→45s、medium→60s、hard→90s。
|
||||
|
||||
**理由**: 分级超时是合理设计(简单任务不应等太久),但 20s 对真实 LLM 工具调用太短。qwen3.7-plus 的 p50 延迟 35s、p95 42s(来自 benchmark 报告),20s 必然超时。
|
||||
|
||||
### KTD2: httpx 超时从 ProviderConfig 透传,保留硬编码作为 fallback
|
||||
|
||||
**决策**: `OpenAICompatibleProvider.__init__` 读取 `config.timeout`,若未设置则 fallback 到 60s。
|
||||
|
||||
**理由**: ProviderConfig.timeout 默认 120s 是有意的(LLM 推理慢),httpx 硬编码 60s 会先于 ProviderConfig 触发,导致配置无效。
|
||||
|
||||
### KTD3: QualityGate skill_match 在 ConfigDrivenAgent 执行后调用
|
||||
|
||||
**决策**: 在 `ConfigDrivenAgent._execute_skill_task()` 返回前调用 `QualityGate.validate(output, skill_context=skill_config)`。
|
||||
|
||||
**理由**: skill_match 需要技能上下文(intent_keywords)才能校验输出一致性。ConfigDrivenAgent 是技能执行的统一入口,在此处调用覆盖面最广。
|
||||
|
||||
### KTD4: disambiguation_keywords 作为 QualityGate 消歧输入,不用于路由
|
||||
|
||||
**决策**: disambiguation_keywords 添加到 skill yaml 的 `intent` 节点下,由 QualityGate 读取用于输出校验,不影响 RequestPreprocessor 路由决策。
|
||||
|
||||
**理由**: 当前路由已简化为"显式前缀 + 正则 + 默认 REACT",不依赖关键词。disambiguation_keywords 的价值在于 QualityGate 校验输出是否与技能意图一致。
|
||||
|
||||
### KTD5: 路由优化采用"扩展正则 + 不引入 LLM 分类"策略
|
||||
|
||||
**决策**: 新增 factual(是什么/什么是/解释)、数学(计算/算一下)、翻译(翻译/translate)三类正则走 DIRECT_CHAT,不引入 LLM quick_classify。
|
||||
|
||||
**理由**: 保持 RequestPreprocessor 的"零 token 成本快速路径"设计哲学。LLM 二次分类已被明确移除(docstring: "LLM blind-classification without tool context is unreliable"),不回退。
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
### In Scope
|
||||
|
||||
- Benchmark 超时阈值调整
|
||||
- OpenAICompatibleProvider httpx 超时修复
|
||||
- QualityGate skill_match 激活 + 严格模式
|
||||
- 4 对易混淆技能 disambiguation_keywords
|
||||
- RequestPreprocessor 正则扩展
|
||||
- 重新基准测试
|
||||
|
||||
### Deferred to Follow-Up Work
|
||||
|
||||
- DockerComputerUseSession 4 个 stub(需真实 Docker 环境)
|
||||
- 计划 001(U7/U8/U9/U10 未完成项)
|
||||
- 计划 002(8 个待决策问题)
|
||||
- 计划 003(7 项 Deferred)
|
||||
- LLM 二次分类消歧(P2,需评估延迟代价)
|
||||
- 复杂度校准数据集构建(P2,需收集标注数据)
|
||||
|
||||
---
|
||||
|
||||
## Implementation Units
|
||||
|
||||
### U1. 修复 Benchmark 超时阈值
|
||||
|
||||
**Goal:** 提升 easy/medium/hard 难度的 LLM 超时阈值,避免真实 LLM 因超时失败
|
||||
|
||||
**Requirements:** R1
|
||||
|
||||
**Dependencies:** 无
|
||||
|
||||
**Files:**
|
||||
- `src/agentkit/cli/benchmark.py` — 修改 `_LLM_TIMEOUT_BY_DIFFICULTY` 字典
|
||||
|
||||
**Approach:**
|
||||
将 `_LLM_TIMEOUT_BY_DIFFICULTY` 从 `{"easy": 20.0, "medium": 40.0, "hard": 60.0}` 改为 `{"easy": 45.0, "medium": 60.0, "hard": 90.0}`。默认 fallback 从 30.0 改为 60.0。
|
||||
|
||||
**Patterns to follow:** 现有 `_LLM_TIMEOUT_BY_DIFFICULTY` 字典结构
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: easy 难度用例在 45s 内完成 → passed=True
|
||||
- Edge case: easy 难度用例在 20-45s 之间完成 → 旧逻辑会超时,新逻辑 passed=True
|
||||
- Error path: easy 难度用例超过 45s → 超时失败,detail 包含 "45s"
|
||||
|
||||
**Verification:** 运行 `agentkit benchmark --mode llm`,llm-001 不再因超时失败
|
||||
|
||||
---
|
||||
|
||||
### U2. 修复 OpenAICompatibleProvider httpx 超时硬编码
|
||||
|
||||
**Goal:** httpx 客户端使用 ProviderConfig.timeout 而非硬编码 60s
|
||||
|
||||
**Requirements:** R2
|
||||
|
||||
**Dependencies:** 无
|
||||
|
||||
**Files:**
|
||||
- `src/agentkit/llm/providers/openai.py` — 修改 httpx.AsyncClient 构造
|
||||
- `tests/unit/llm/test_openai_provider.py` — 新增超时透传测试
|
||||
|
||||
**Approach:**
|
||||
在 `OpenAICompatibleProvider.__init__` 中,将 `httpx.AsyncClient(timeout=60.0)` 改为 `httpx.AsyncClient(timeout=self._config.timeout)`。若 `self._config` 不存在或 `timeout` 未设置,fallback 到 60.0。
|
||||
|
||||
**Patterns to follow:** `RemoteLLMProvider` 已使用 `timeout=120.0` 参数模式
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: ProviderConfig(timeout=120) → httpx client timeout=120
|
||||
- Edge case: ProviderConfig(timeout=0) → fallback 到 60.0
|
||||
- Edge case: ProviderConfig 未设置 timeout → 使用默认 120.0
|
||||
- Integration: 实际 LLM 调用在 60-120s 之间完成 → 旧逻辑会超时,新逻辑成功
|
||||
|
||||
**Verification:** 单元测试通过 + benchmark 中无 httpx 超时错误
|
||||
|
||||
---
|
||||
|
||||
### U3. 激活 QualityGate skill_match 校验
|
||||
|
||||
**Goal:** 在技能执行管线中传入 skill_context,激活 skill_match 输出一致性校验
|
||||
|
||||
**Requirements:** R3
|
||||
|
||||
**Dependencies:** U4(disambiguation_keywords 提供 intent_keywords 消歧)
|
||||
|
||||
**Files:**
|
||||
- `src/agentkit/core/config_driven.py` — 在 `_execute_skill_task` 返回前调用 QualityGate.validate 传入 skill_context
|
||||
- `src/agentkit/quality/gate.py` — 确认 `_check_skill_match` 读取 disambiguation_keywords
|
||||
- `tests/unit/quality/test_gate.py` — 新增 skill_match 激活测试
|
||||
|
||||
**Approach:**
|
||||
1. 在 `ConfigDrivenAgent._execute_skill_task()` 中,构造 `skill_context = {"intent_keywords": skill_config.intent.keywords + skill_config.intent.disambiguation_keywords}`
|
||||
2. 调用 `self._quality_gate.validate(output, skill_context=skill_context)`
|
||||
3. 在 `gate.py` 的 `_check_skill_match` 中,同时检查 `intent_keywords` 和 `disambiguation_keywords`
|
||||
|
||||
**Patterns to follow:** `gate.py` 现有 `_check_skill_match` 方法签名
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: 技能输出包含 intent_keywords → skill_match passed=True
|
||||
- Error path: 技能输出不包含任何 intent_keywords → skill_match 警告
|
||||
- Integration: reflexion_agent 输出包含 "review" → 与 code_reviewer 的 disambiguation_keywords 匹配 → 触发消歧警告
|
||||
- Edge case: skill_context=None → 跳过 skill_match(向后兼容)
|
||||
|
||||
**Verification:** 单元测试通过 + 技能执行日志中出现 skill_match 校验记录
|
||||
|
||||
---
|
||||
|
||||
### U4. 添加 disambiguation_keywords 到易混淆技能对
|
||||
|
||||
**Goal:** 为 4 对易混淆技能添加 disambiguation_keywords,支持 QualityGate 消歧
|
||||
|
||||
**Requirements:** R5
|
||||
|
||||
**Dependencies:** 无
|
||||
|
||||
**Files:**
|
||||
- `configs/skills/reflexion_agent.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/code_reviewer.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/react_agent.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/goal_driven_agent.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/rewoo_agent.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/competitor_analyzer.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/content_generator.yaml` — 添加 disambiguation_keywords
|
||||
- `configs/skills/geo_optimizer.yaml` — 添加 disambiguation_keywords
|
||||
- `src/agentkit/skills/base.py` — SkillConfig.intent 添加 disambiguation_keywords 字段
|
||||
|
||||
**Approach:**
|
||||
1. 在 `SkillIntent` model 中添加 `disambiguation_keywords: list[str] = []` 字段
|
||||
2. 为每对易混淆技能添加互斥关键词:
|
||||
- reflexion_agent: `["反思", "自我验证", "迭代优化"]`
|
||||
- code_reviewer: `["代码审查", "代码问题", "bug 检查"]`
|
||||
- react_agent: `["实时搜索", "工具调用", "信息查询"]`
|
||||
- goal_driven_agent: `["目标分解", "多步规划", "方案对比"]`
|
||||
- rewoo_agent: `["并行采集", "批量获取", "多源数据"]`
|
||||
- competitor_analyzer: `["竞品分析", "竞争对比", "市场对手"]`
|
||||
- content_generator: `["内容创作", "文章生成", "选题写作"]`
|
||||
- geo_optimizer: `["SEO 优化", "GEO 优化", "搜索排名"]`
|
||||
|
||||
**Patterns to follow:** 现有 `intent.keywords` 字段结构
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: SkillConfig 加载 yaml 含 disambiguation_keywords → 字段非空
|
||||
- Edge case: yaml 未含 disambiguation_keywords → 字段默认空列表
|
||||
- Integration: QualityGate 读取 disambiguation_keywords 用于消歧校验
|
||||
|
||||
**Verification:** `agentkit skill list` 正常加载所有技能 + 单元测试通过
|
||||
|
||||
---
|
||||
|
||||
### U5. 优化 RequestPreprocessor 路由正则
|
||||
|
||||
**Goal:** 新增 factual/数学/翻译类正则,减少不必要的 REACT 调用
|
||||
|
||||
**Requirements:** R6
|
||||
|
||||
**Dependencies:** 无
|
||||
|
||||
**Files:**
|
||||
- `src/agentkit/chat/request_preprocessor.py` — 新增 3 条正则
|
||||
- `tests/unit/chat/test_request_preprocessor.py` — 新增路由测试
|
||||
|
||||
**Approach:**
|
||||
新增 3 条正则走 DIRECT_CHAT:
|
||||
1. `_FACTUAL_RE` — "什么是X/X是什么/解释一下X/define X" 等纯知识问答
|
||||
2. `_MATH_RE` — "计算X/算一下X/calculate X" 等简单数学(无变量、无方程)
|
||||
3. `_TRANSLATION_RE` — "翻译X/translate X" 等纯翻译请求
|
||||
|
||||
**注意**: 这些正则必须严格匹配,避免误拦截需要工具的请求。例如 "分析一下服务器的IP" 不应匹配 `_FACTUAL_RE`(包含"分析"动词暗示需要工具)。
|
||||
|
||||
**Patterns to follow:** 现有 `_GREETING_RE` / `_CHAT_MODE_RE` / `_IDENTITY_RE` 正则模式
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: "什么是机器学习" → 匹配 _FACTUAL_RE → DIRECT_CHAT
|
||||
- Happy path: "计算 1+2+3" → 匹配 _MATH_RE → DIRECT_CHAT
|
||||
- Happy path: "translate hello to Chinese" → 匹配 _TRANSLATION_RE → DIRECT_CHAT
|
||||
- Edge case: "什么是当前服务器的IP地址" → 不匹配 _FACTUAL_RE(含"当前服务器"暗示需要工具)→ REACT
|
||||
- Edge case: "计算斐波那契数列的第100项" → 不匹配 _MATH_RE(含"斐波那契数列"暗示需要代码)→ REACT
|
||||
- Error path: 空字符串 → 不匹配任何正则 → REACT
|
||||
|
||||
**Verification:** 单元测试通过 + benchmark 中 DIRECT_CHAT 比例提升
|
||||
|
||||
---
|
||||
|
||||
### U6. 重新基准测试 + 建立当前架构基线
|
||||
|
||||
**Goal:** 修复后重新运行 benchmark,建立当前 RequestPreprocessor 架构的基线
|
||||
|
||||
**Requirements:** R7
|
||||
|
||||
**Dependencies:** U1, U2, U3, U4, U5(所有修复完成后)
|
||||
|
||||
**Files:**
|
||||
- `test-results/benchmark/baseline.json` — 更新基线
|
||||
- `test-results/benchmark/benchmark_report.md` — 更新报告
|
||||
|
||||
**Approach:**
|
||||
1. 运行 `agentkit benchmark --mode llm`(full 模式,真实 LLM)
|
||||
2. 运行 `agentkit benchmark --mode llm --fast`(fast 模式)
|
||||
3. 对比修复前后准确率、超时率、延迟
|
||||
4. 更新 `baseline.json` 作为当前架构基线
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: full 模式准确率 ≥ 80%(5 用例至少 4 通过)
|
||||
- Happy path: fast 模式准确率 = 100%
|
||||
- Edge case: llm-001 不再超时
|
||||
- Edge case: llm-004 不再超时
|
||||
|
||||
**Verification:** benchmark 报告生成 + 准确率达标
|
||||
|
||||
---
|
||||
|
||||
## Risks & Dependencies
|
||||
|
||||
| 风险 | 严重度 | 缓解措施 |
|
||||
|------|--------|----------|
|
||||
| 新增正则误拦截需要工具的请求 | 中 | 正则设计保守,仅匹配纯知识/数学/翻译,单元测试覆盖边界 |
|
||||
| QualityGate skill_match 误报导致输出被拦截 | 中 | skill_match 单独不拦截(现有设计),仅与其他失败共病时拦截 |
|
||||
| disambiguation_keywords 与现有 keywords 语义重叠 | 低 | disambiguation_keywords 是 keywords 的补充,不替代 |
|
||||
| benchmark 超时提升后延迟增加 | 低 | 超时是上限而非目标,快速完成的用例不受影响 |
|
||||
|
||||
## Open Questions
|
||||
|
||||
无 — 所有技术决策已在 KTD 中明确。
|
||||
|
||||
## System-Wide Impact
|
||||
|
||||
- **LLM 网关**: httpx 超时修复影响所有 LLM 调用(更宽松的超时)
|
||||
- **技能执行**: QualityGate 激活影响所有技能输出校验
|
||||
- **Benchmark**: 超时阈值影响所有 benchmark 用例
|
||||
- **路由**: 新增正则影响所有非显式前缀的请求
|
||||
|
||||
## Verification Results (2026-06-20)
|
||||
|
||||
### U1–U5 代码修复验证
|
||||
|
||||
| 单元 | 验证方式 | 结果 |
|
||||
|------|----------|------|
|
||||
| U1: Benchmark 超时 | `agentkit benchmark --mode llm` | ✅ llm-001/llm-004 不再超时 |
|
||||
| U2: httpx 超时 | `pytest tests/unit/test_llm_provider.py` | ✅ 2 个新测试通过 |
|
||||
| U3: QualityGate 激活 | `pytest tests/unit/quality/` | ✅ 176 个质量门控测试通过 |
|
||||
| U4: disambiguation_keywords | 16 个技能 yaml 加载验证 | ✅ 全部加载成功 |
|
||||
| U5: 路由正则 | `pytest tests/unit/chat/test_request_preprocessor.py` | ✅ 38 个测试通过(19 新增) |
|
||||
|
||||
### U6 基准测试结果
|
||||
|
||||
| 指标 | 修复前 (2026-06-20 03:18) | 修复后 (2026-06-20 11:05) | 变化 |
|
||||
|------|--------------------------|--------------------------|------|
|
||||
| 准确率 | 60.0% | 93.3% ± 9.4% | **+33.3%** |
|
||||
| 通过/总数 | 3/5 | 4/5 | +1 |
|
||||
| 超时数 | 2 | 0 (llm-002 偶发) | **-2** |
|
||||
| 一致性 | N/A | 100% | — |
|
||||
| p50 延迟 | 35.3s | 40.8s | +5.5s(可接受) |
|
||||
|
||||
**剩余问题**: llm-002 (tool_selection, medium) 在 3 次运行中 1 次超时,p95=56.3s 接近 medium 60s 阈值。后续可考虑提升 medium 超时至 75s。
|
||||
|
|
@ -21,7 +21,9 @@ dependencies = [
|
|||
"pyyaml>=6.0",
|
||||
"jsonschema>=4.0",
|
||||
"typer>=0.12",
|
||||
"rich>=13.0",
|
||||
"pyjwt>=2.8",
|
||||
"bcrypt>=4.0",
|
||||
"aiosqlite>=0.20",
|
||||
]
|
||||
|
||||
[project.scripts]
|
||||
|
|
|
|||
|
|
@ -52,6 +52,44 @@ _IDENTITY_RE = re.compile(
|
|||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# 中文知识问答:什么是X/解释X/定义X — 中文不需要空格分隔
|
||||
# 仅匹配纯知识性问句,排除需要实时数据的请求(由 _TOOL_CONTEXT_RE 过滤)
|
||||
# 支持尾部标点(?/!/。等),与 _GREETING_RE/_IDENTITY_RE 保持一致
|
||||
_FACTUAL_CN_RE = re.compile(
|
||||
r"^(什么是|解释一下|解释下|定义一下|定义|说说什么是|介绍下什么是)"
|
||||
r"[\u4e00-\u9fa5a-zA-Z0-9 \t]+[??!!.。]*$"
|
||||
)
|
||||
|
||||
# English factual questions — requires whitespace separator
|
||||
_FACTUAL_EN_RE = re.compile(
|
||||
r"^(what\s+is|what's|define|explain)\s+[\u4e00-\u9fa5a-zA-Z0-9 \t]+[??!!.。]*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# 需要工具/实时数据的上下文关键词 — 出现这些词时不走 DIRECT_CHAT
|
||||
# 包含中英文关键词,覆盖服务器/数据库/系统状态/配置文件等场景
|
||||
_TOOL_CONTEXT_RE = re.compile(
|
||||
r"(当前|现在|服务器|数据库|系统|状态|最新|实时|今天|昨天|本机|本地|线上|"
|
||||
r"线上环境|生产环境|测试环境|配置文件|日志|进程|端口|IP|CPU|内存|磁盘|"
|
||||
r"current|server|database|system\s+status|latest|realtime|today|yesterday|"
|
||||
r"local|process|port|log|config\s+file)",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# 纯算术:计算 1+2+3 / 算一下 15*23 — 仅匹配数字和运算符
|
||||
# 不匹配含中文/字母的复杂表达式(如"计算斐波那契数列")
|
||||
_MATH_RE = re.compile(
|
||||
r"^(计算|算一下|算下|calculate|compute)\s+[\d +\-*/().\t]+[??!!.。]*$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
# 纯翻译:翻译 X / translate X — 需要空格分隔,排除"翻译X为Y"格式
|
||||
# 排除含工具上下文关键词的请求(如"翻译 这个配置文件")
|
||||
_TRANSLATION_RE = re.compile(
|
||||
r"^(翻译|translate)\s+.+$",
|
||||
re.IGNORECASE,
|
||||
)
|
||||
|
||||
|
||||
class RequestPreprocessor:
|
||||
"""Minimal preprocessing layer: regex fast-path + default REACT.
|
||||
|
|
@ -190,10 +228,33 @@ class RequestPreprocessor:
|
|||
|
||||
@staticmethod
|
||||
def _is_trivial_input(text: str) -> bool:
|
||||
"""Check if the input is a greeting, chitchat, or identity question.
|
||||
"""Check if the input is a greeting, chitchat, identity question, or pure knowledge/math/translation.
|
||||
|
||||
These are zero-cost direct chat: no tool usage, no ReAct loop needed.
|
||||
Factual/translation patterns are conservative — they exclude requests
|
||||
that contain tool-context keywords (当前/服务器/数据库/config etc.) to avoid
|
||||
misrouting tool-requiring queries to DIRECT_CHAT.
|
||||
"""
|
||||
return bool(
|
||||
_GREETING_RE.match(text) or _CHAT_MODE_RE.match(text) or _IDENTITY_RE.match(text)
|
||||
)
|
||||
# Multi-line inputs always go to REACT (avoid bypassing tools via newline)
|
||||
if "\n" in text or "\r" in text:
|
||||
return False
|
||||
|
||||
# Greeting / chitchat / identity — always safe
|
||||
if _GREETING_RE.match(text) or _CHAT_MODE_RE.match(text) or _IDENTITY_RE.match(text):
|
||||
return True
|
||||
|
||||
# Factual questions (CN/EN) — only if no tool-context keywords present
|
||||
if (
|
||||
_FACTUAL_CN_RE.match(text) or _FACTUAL_EN_RE.match(text)
|
||||
) and not _TOOL_CONTEXT_RE.search(text):
|
||||
return True
|
||||
|
||||
# Pure arithmetic — only digits and operators, no tool context possible
|
||||
if _MATH_RE.match(text):
|
||||
return True
|
||||
|
||||
# Pure translation — exclude tool-context (e.g. "翻译 这个配置文件")
|
||||
if _TRANSLATION_RE.match(text) and not _TOOL_CONTEXT_RE.search(text):
|
||||
return True
|
||||
|
||||
return False
|
||||
|
|
|
|||
|
|
@ -682,9 +682,9 @@ def _build_real_components() -> tuple[object, object, object] | None:
|
|||
# Difficulty-based timeout (seconds) and max_tokens for LLM calls.
|
||||
# Hard tasks use streaming with keyword detection for early termination.
|
||||
_LLM_TIMEOUT_BY_DIFFICULTY: dict[str, float] = {
|
||||
"easy": 20.0,
|
||||
"medium": 40.0,
|
||||
"hard": 60.0,
|
||||
"easy": 45.0,
|
||||
"medium": 60.0,
|
||||
"hard": 90.0,
|
||||
}
|
||||
|
||||
_LLM_MAX_TOKENS_BY_DIFFICULTY: dict[str, int] = {
|
||||
|
|
@ -745,7 +745,7 @@ async def _execute_llm_reasoning_task(
|
|||
start = time.perf_counter()
|
||||
|
||||
# Difficulty-based configuration
|
||||
timeout_s = _LLM_TIMEOUT_BY_DIFFICULTY.get(task.difficulty, 30.0)
|
||||
timeout_s = _LLM_TIMEOUT_BY_DIFFICULTY.get(task.difficulty, 60.0)
|
||||
max_tokens = _LLM_MAX_TOKENS_BY_DIFFICULTY.get(task.difficulty, 512)
|
||||
|
||||
# Step 1: preprocess to get execution mode
|
||||
|
|
|
|||
|
|
@ -192,6 +192,18 @@ class BaseAgent(ABC):
|
|||
lines.append(f" - {msg}")
|
||||
return "\n".join(lines)
|
||||
|
||||
def _build_skill_context(self) -> dict[str, Any] | None:
|
||||
"""从当前技能配置构建 skill_context,用于 QualityGate skill_match 校验"""
|
||||
if not self._skill:
|
||||
return None
|
||||
intent = getattr(self._skill.config, "intent", None)
|
||||
if intent is None:
|
||||
return None
|
||||
keywords = list(intent.keywords) + list(intent.disambiguation_keywords)
|
||||
if not keywords:
|
||||
return None
|
||||
return {"intent_keywords": keywords}
|
||||
|
||||
# ── 可插拔能力注入 ──────────────────────────────────────
|
||||
|
||||
def use_tool(self, tool: "Tool") -> "BaseAgent":
|
||||
|
|
@ -329,14 +341,19 @@ class BaseAgent(ABC):
|
|||
|
||||
# v2: Quality Gate 检查
|
||||
if self._skill:
|
||||
quality_result = await self.quality_gate.validate(output, self._skill)
|
||||
skill_context = self._build_skill_context()
|
||||
quality_result = await self.quality_gate.validate(
|
||||
output, self._skill, skill_context=skill_context
|
||||
)
|
||||
if not quality_result.passed and quality_result.can_retry:
|
||||
max_retries = self._skill.config.quality_gate.max_retries
|
||||
retry_count = 0
|
||||
while not quality_result.passed and retry_count < max_retries:
|
||||
feedback = self._build_quality_feedback(quality_result)
|
||||
output = await self.handle_task_with_feedback(task, feedback)
|
||||
quality_result = await self.quality_gate.validate(output, self._skill)
|
||||
quality_result = await self.quality_gate.validate(
|
||||
output, self._skill, skill_context=skill_context
|
||||
)
|
||||
retry_count += 1
|
||||
|
||||
# 后置钩子
|
||||
|
|
|
|||
|
|
@ -56,6 +56,7 @@ class OpenAICompatibleProvider(LLMProvider):
|
|||
max_connections: int = 100,
|
||||
max_keepalive_connections: int = 20,
|
||||
keepalive_expiry: float = 30.0,
|
||||
timeout: float = 120.0,
|
||||
):
|
||||
self._api_key = api_key
|
||||
self._base_url = base_url.rstrip("/")
|
||||
|
|
@ -65,7 +66,7 @@ class OpenAICompatibleProvider(LLMProvider):
|
|||
max_keepalive_connections=max_keepalive_connections,
|
||||
keepalive_expiry=keepalive_expiry,
|
||||
)
|
||||
self._client = httpx.AsyncClient(timeout=60.0, limits=limits)
|
||||
self._client = httpx.AsyncClient(timeout=timeout, limits=limits)
|
||||
self._retry_policy = RetryPolicy(retry_config) if retry_config else None
|
||||
self._circuit_breaker = (
|
||||
CircuitBreaker(circuit_breaker_config, provider="openai")
|
||||
|
|
|
|||
|
|
@ -128,6 +128,7 @@ def _create_provider(name: str, pconf) -> object:
|
|||
max_connections=pconf.max_connections,
|
||||
max_keepalive_connections=pconf.max_keepalive_connections,
|
||||
keepalive_expiry=pconf.keepalive_expiry,
|
||||
timeout=pconf.timeout,
|
||||
)
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -0,0 +1,85 @@
|
|||
import { test, expect } from '@playwright/test'
|
||||
import {
|
||||
loginAndHydrate,
|
||||
sendChatMessage,
|
||||
waitForLlmResponse,
|
||||
LLM_RESPONSE_TIMEOUT_MS,
|
||||
} from './helpers'
|
||||
|
||||
test.describe('Chat flow', () => {
|
||||
test.beforeEach(async ({ page }) => {
|
||||
// Authenticate via API and hydrate localStorage before navigating
|
||||
await loginAndHydrate(page)
|
||||
await page.goto('/agent/chat')
|
||||
|
||||
// Wait for the chat view to mount — the input textarea should be visible
|
||||
await expect(page.getByPlaceholder('输入消息,按 Enter 发送...')).toBeVisible({
|
||||
timeout: 15_000,
|
||||
})
|
||||
})
|
||||
|
||||
test('should send a message and receive a real LLM response', async ({ page }) => {
|
||||
const testMessage = '你好,请用一句话介绍自己'
|
||||
|
||||
// Send the message
|
||||
await sendChatMessage(page, testMessage)
|
||||
|
||||
// The user's message should appear immediately in the chat view.
|
||||
// Use .last() because the conversation may contain prior messages.
|
||||
const userMessage = page.locator('.message-shell--user .user-bubble')
|
||||
await expect(userMessage.last()).toContainText('你好', { timeout: 10_000 })
|
||||
|
||||
// Wait for the real LLM response (up to 60 seconds).
|
||||
// The assistant message is rendered inside .message-shell--assistant
|
||||
// with markdown content in .assistant-text__markdown.
|
||||
test.setTimeout(LLM_RESPONSE_TIMEOUT_MS + 30_000)
|
||||
await waitForLlmResponse(page, expect, LLM_RESPONSE_TIMEOUT_MS)
|
||||
|
||||
// The response should contain some text (non-empty, non-error)
|
||||
const assistantContent = page.locator(
|
||||
'.message-shell--assistant .assistant-text__markdown',
|
||||
)
|
||||
const responseText = (await assistantContent.last().textContent()) ?? ''
|
||||
expect(responseText.trim().length).toBeGreaterThan(0)
|
||||
|
||||
// The response should not be an error message
|
||||
const errorCard = page.locator('.message-shell--assistant .error-card')
|
||||
await expect(errorCard).toHaveCount(0)
|
||||
})
|
||||
|
||||
test('should display both user and assistant messages in history', async ({ page }) => {
|
||||
const testMessage = '1+1等于几?请只回答数字'
|
||||
|
||||
await sendChatMessage(page, testMessage)
|
||||
|
||||
// Verify user message is displayed (use .last() for most recent)
|
||||
await expect(
|
||||
page.locator('.message-shell--user .user-bubble').last(),
|
||||
).toContainText('1+1', { timeout: 10_000 })
|
||||
|
||||
// Wait for assistant response
|
||||
test.setTimeout(LLM_RESPONSE_TIMEOUT_MS + 30_000)
|
||||
await waitForLlmResponse(page, expect, LLM_RESPONSE_TIMEOUT_MS)
|
||||
|
||||
// Both user and assistant message shells should be present
|
||||
const userMessages = page.locator('.message-shell--user')
|
||||
const assistantMessages = page.locator('.message-shell--assistant')
|
||||
|
||||
await expect(userMessages.first()).toBeVisible()
|
||||
await expect(assistantMessages.first()).toBeVisible()
|
||||
|
||||
// There should be at least one user message and one assistant message
|
||||
expect(await userMessages.count()).toBeGreaterThanOrEqual(1)
|
||||
expect(await assistantMessages.count()).toBeGreaterThanOrEqual(1)
|
||||
})
|
||||
|
||||
test('should clear input after sending', async ({ page }) => {
|
||||
const textarea = page.getByPlaceholder('输入消息,按 Enter 发送...')
|
||||
|
||||
await textarea.fill('测试消息清空')
|
||||
await textarea.press('Enter')
|
||||
|
||||
// The textarea should be cleared after sending
|
||||
await expect(textarea).toHaveText('', { timeout: 5_000 })
|
||||
})
|
||||
})
|
||||
|
|
@ -0,0 +1,61 @@
|
|||
/**
|
||||
* Playwright global setup — runs once before all test files.
|
||||
*
|
||||
* Responsibilities:
|
||||
* 1. Wait for the backend health endpoint to respond (the webServer config
|
||||
* already polls the URL, but we double-check here for robustness).
|
||||
* 2. Invoke the Python script that creates / updates the E2E test admin user
|
||||
* in the auth SQLite DB.
|
||||
*/
|
||||
|
||||
import { execFileSync } from 'node:child_process'
|
||||
import { existsSync } from 'node:fs'
|
||||
import { dirname, resolve } from 'node:path'
|
||||
import { fileURLToPath } from 'node:url'
|
||||
|
||||
const __filename = fileURLToPath(import.meta.url)
|
||||
const __dirname = dirname(__filename)
|
||||
|
||||
const BACKEND_HEALTH_URL = 'http://127.0.0.1:8000/api/v1/health'
|
||||
const SETUP_SCRIPT = resolve(__dirname, 'setup-test-user.py')
|
||||
|
||||
/** Poll a URL until it returns 200 or the timeout expires. */
|
||||
async function waitForUrl(url: string, timeoutMs = 60_000): Promise<void> {
|
||||
const deadline = Date.now() + timeoutMs
|
||||
while (Date.now() < deadline) {
|
||||
try {
|
||||
const resp = await fetch(url)
|
||||
if (resp.ok) return
|
||||
} catch {
|
||||
// server not ready yet
|
||||
}
|
||||
await new Promise((r) => setTimeout(r, 1000))
|
||||
}
|
||||
throw new Error(`Timed out waiting for ${url}`)
|
||||
}
|
||||
|
||||
export default async function globalSetup(): Promise<void> {
|
||||
// 1. Verify backend is up (webServer should have started it already).
|
||||
await waitForUrl(BACKEND_HEALTH_URL, 60_000)
|
||||
console.log('[global-setup] Backend health check passed')
|
||||
|
||||
// 2. Create / update the test admin user.
|
||||
if (!existsSync(SETUP_SCRIPT)) {
|
||||
throw new Error(`Setup script not found: ${SETUP_SCRIPT}`)
|
||||
}
|
||||
|
||||
const pythonBin = process.env.E2E_PYTHON ?? 'python3'
|
||||
try {
|
||||
execFileSync(pythonBin, [SETUP_SCRIPT], {
|
||||
stdio: 'inherit',
|
||||
timeout: 30_000,
|
||||
})
|
||||
} catch (err) {
|
||||
throw new Error(
|
||||
`Failed to create test user via ${pythonBin} ${SETUP_SCRIPT}: ${
|
||||
err instanceof Error ? err.message : String(err)
|
||||
}`
|
||||
)
|
||||
}
|
||||
console.log('[global-setup] Test user ready')
|
||||
}
|
||||
|
|
@ -0,0 +1,233 @@
|
|||
/**
|
||||
* Shared E2E test helpers.
|
||||
*
|
||||
* - Login via API and hydrate localStorage so the Vue auth store picks up
|
||||
* the tokens on page load (the store reads from localStorage on init).
|
||||
* - Server health check.
|
||||
* - Wait for a real LLM response in the chat view.
|
||||
*/
|
||||
|
||||
import type { Page, expect as ExpectType } from '@playwright/test'
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Constants
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Backend API base — absolute URL so fetch() works in both Node.js (Playwright
|
||||
* test context) and browser context. The Vite dev-server proxy is not available
|
||||
* in Node.js, so we target the backend directly.
|
||||
*/
|
||||
export const API_BASE = 'http://127.0.0.1:8000/api/v1'
|
||||
|
||||
/** Backend health endpoint (absolute URL for direct fetch). */
|
||||
export const BACKEND_HEALTH_URL = 'http://127.0.0.1:8000/api/v1/health'
|
||||
|
||||
/** Test admin credentials — must match setup-test-user.py defaults. */
|
||||
export const TEST_USER = {
|
||||
username: process.env.E2E_TEST_USERNAME ?? 'e2e_test_admin',
|
||||
password: process.env.E2E_TEST_PASSWORD ?? 'E2eTestPass123!',
|
||||
email: process.env.E2E_TEST_EMAIL ?? 'e2e-test@example.com',
|
||||
} as const
|
||||
|
||||
/** localStorage keys used by the auth store (see stores/auth.ts). */
|
||||
const ACCESS_TOKEN_KEY = 'agentkit.access_token'
|
||||
const REFRESH_TOKEN_KEY = 'agentkit.refresh_token'
|
||||
const USER_KEY = 'agentkit.user'
|
||||
|
||||
/** Max wait for a real LLM response (seconds → ms). */
|
||||
export const LLM_RESPONSE_TIMEOUT_MS = 60_000
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Types
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
interface IAuthUser {
|
||||
id: string
|
||||
username: string
|
||||
email: string
|
||||
role: string
|
||||
is_active: boolean
|
||||
is_terminal_authorized: boolean
|
||||
is_server_terminal_authorized: boolean
|
||||
}
|
||||
|
||||
interface ITokenPair {
|
||||
access_token: string
|
||||
refresh_token: string
|
||||
token_type: string
|
||||
expires_in: number
|
||||
user: IAuthUser
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Server health
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Poll the backend health endpoint until it responds 200 or times out.
|
||||
* Useful as a sanity check inside tests.
|
||||
*/
|
||||
export async function waitForServer(
|
||||
url: string = BACKEND_HEALTH_URL,
|
||||
timeoutMs = 30_000,
|
||||
): Promise<void> {
|
||||
const deadline = Date.now() + timeoutMs
|
||||
while (Date.now() < deadline) {
|
||||
try {
|
||||
const resp = await fetch(url)
|
||||
if (resp.ok) return
|
||||
} catch {
|
||||
// not ready
|
||||
}
|
||||
await new Promise((r) => setTimeout(r, 1_000))
|
||||
}
|
||||
throw new Error(`Server at ${url} did not become healthy within ${timeoutMs}ms`)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Login helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Authenticate via the REST API and return the token pair.
|
||||
* Retries on 429 (rate limit) with exponential backoff.
|
||||
* Caches the token pair module-level so subsequent calls reuse it
|
||||
* (avoids triggering the server's rate limiter).
|
||||
* Throws on other non-200 responses.
|
||||
*/
|
||||
let _cachedTokenPair: ITokenPair | null = null
|
||||
|
||||
export async function loginViaApi(): Promise<ITokenPair> {
|
||||
// Return cached token if available (avoids rate limiting across tests).
|
||||
if (_cachedTokenPair) {
|
||||
return _cachedTokenPair
|
||||
}
|
||||
|
||||
const maxRetries = 5
|
||||
for (let attempt = 0; attempt < maxRetries; attempt++) {
|
||||
const resp = await fetch(`${API_BASE}/auth/login`, {
|
||||
method: 'POST',
|
||||
headers: { 'Content-Type': 'application/json' },
|
||||
body: JSON.stringify({
|
||||
username: TEST_USER.username,
|
||||
password: TEST_USER.password,
|
||||
}),
|
||||
})
|
||||
|
||||
if (resp.ok) {
|
||||
_cachedTokenPair = (await resp.json()) as ITokenPair
|
||||
return _cachedTokenPair
|
||||
}
|
||||
|
||||
if (resp.status === 429 && attempt < maxRetries - 1) {
|
||||
// Rate limited — wait and retry (5s, 10s, 20s, 40s)
|
||||
const delayMs = 5000 * Math.pow(2, attempt)
|
||||
await new Promise((r) => setTimeout(r, delayMs))
|
||||
continue
|
||||
}
|
||||
|
||||
const detail = await resp.text().catch(() => '<no body>')
|
||||
throw new Error(`Login failed (${resp.status}): ${detail}`)
|
||||
}
|
||||
throw new Error('Login failed: max retries exceeded')
|
||||
}
|
||||
|
||||
/**
|
||||
* Log in via the API and hydrate localStorage so the Pinia auth store
|
||||
* picks up the tokens on the next page navigation.
|
||||
*
|
||||
* The auth store (stores/auth.ts) reads `agentkit.access_token`,
|
||||
* `agentkit.refresh_token`, and `agentkit.user` from localStorage on
|
||||
* construction, so populating these before navigating is sufficient.
|
||||
*/
|
||||
export async function loginAndHydrate(page: Page): Promise<ITokenPair> {
|
||||
const tokens = await loginViaApi()
|
||||
|
||||
await page.goto('/login')
|
||||
|
||||
await page.evaluate(
|
||||
({ access, refresh, user }) => {
|
||||
localStorage.setItem('agentkit.access_token', access)
|
||||
localStorage.setItem('agentkit.refresh_token', refresh)
|
||||
localStorage.setItem('agentkit.user', JSON.stringify(user))
|
||||
},
|
||||
{
|
||||
access: tokens.access_token,
|
||||
refresh: tokens.refresh_token,
|
||||
user: tokens.user,
|
||||
},
|
||||
)
|
||||
|
||||
return tokens
|
||||
}
|
||||
|
||||
/**
|
||||
* Clear auth state from localStorage — useful for testing the
|
||||
* unauthenticated-redirect behaviour.
|
||||
*/
|
||||
export async function clearAuth(page: Page): Promise<void> {
|
||||
await page.evaluate(
|
||||
({ access, refresh, user }) => {
|
||||
localStorage.removeItem(access)
|
||||
localStorage.removeItem(refresh)
|
||||
localStorage.removeItem(user)
|
||||
},
|
||||
{
|
||||
access: ACCESS_TOKEN_KEY,
|
||||
refresh: REFRESH_TOKEN_KEY,
|
||||
user: USER_KEY,
|
||||
},
|
||||
)
|
||||
}
|
||||
|
||||
// ---------------------------------------------------------------------------
|
||||
// Chat helpers
|
||||
// ---------------------------------------------------------------------------
|
||||
|
||||
/**
|
||||
* Wait for a real LLM response in the chat view.
|
||||
*
|
||||
* After sending a message, the assistant's response is rendered inside
|
||||
* `.message-shell--assistant .assistant-text__markdown`. While the LLM is
|
||||
* still streaming, the element may be empty or show a spinner. This helper
|
||||
* waits until the assistant message contains non-whitespace text.
|
||||
*
|
||||
* @param page Playwright page
|
||||
* @param expect The `expect` function from @playwright/test
|
||||
* @param timeoutMs Max wait time (default 60s for real LLM calls)
|
||||
*/
|
||||
export async function waitForLlmResponse(
|
||||
page: Page,
|
||||
expect: typeof ExpectType,
|
||||
timeoutMs = LLM_RESPONSE_TIMEOUT_MS,
|
||||
): Promise<void> {
|
||||
// The assistant message content is rendered as sanitized HTML inside
|
||||
// .assistant-text__markdown. Wait for it to have non-empty text content.
|
||||
const assistantContent = page.locator(
|
||||
'.message-shell--assistant .assistant-text__markdown',
|
||||
)
|
||||
|
||||
await expect
|
||||
.poll(
|
||||
async () => {
|
||||
// Check count first to avoid auto-wait on a non-existent element.
|
||||
const count = await assistantContent.count()
|
||||
if (count === 0) return 0
|
||||
const text = await assistantContent.last().textContent()
|
||||
return (text ?? '').trim().length
|
||||
},
|
||||
{ timeout: timeoutMs, intervals: [1_000, 2_000, 5_000] },
|
||||
)
|
||||
.toBeGreaterThan(0)
|
||||
}
|
||||
|
||||
/**
|
||||
* Send a chat message by typing into the textarea and pressing Enter.
|
||||
* Falls back to clicking the send button if Enter doesn't trigger send.
|
||||
*/
|
||||
export async function sendChatMessage(page: Page, message: string): Promise<void> {
|
||||
const textarea = page.getByPlaceholder('输入消息,按 Enter 发送...')
|
||||
await textarea.fill(message)
|
||||
await textarea.press('Enter')
|
||||
}
|
||||
|
|
@ -0,0 +1,79 @@
|
|||
import { test, expect } from '@playwright/test'
|
||||
import { TEST_USER, clearAuth } from './helpers'
|
||||
|
||||
test.describe('Login flow', () => {
|
||||
test.beforeEach(async ({ page }) => {
|
||||
// Ensure no stale tokens from a previous test
|
||||
await page.goto('/login')
|
||||
await clearAuth(page)
|
||||
})
|
||||
|
||||
test('should login successfully with valid credentials', async ({ page }) => {
|
||||
await page.goto('/login')
|
||||
|
||||
// Fill in the form
|
||||
await page.getByPlaceholder('请输入用户名').fill(TEST_USER.username)
|
||||
await page.getByPlaceholder('请输入密码').fill(TEST_USER.password)
|
||||
|
||||
// Submit
|
||||
await page.getByRole('button', { name: /登\s*录/ }).click()
|
||||
|
||||
// Should redirect to /agent (which redirects to /agent/chat)
|
||||
await expect(page).toHaveURL(/\/agent/)
|
||||
|
||||
// The login logo should no longer be visible
|
||||
await expect(page.locator('.login-logo')).not.toBeVisible()
|
||||
})
|
||||
|
||||
test('should show error for wrong password', async ({ page }) => {
|
||||
await page.goto('/login')
|
||||
|
||||
await page.getByPlaceholder('请输入用户名').fill(TEST_USER.username)
|
||||
await page.getByPlaceholder('请输入密码').fill('definitely-wrong-password-12345')
|
||||
|
||||
await page.getByRole('button', { name: /登\s*录/ }).click()
|
||||
|
||||
// The LoginView shows an a-alert with type="error" containing the
|
||||
// server's error message ("Invalid username or password").
|
||||
const errorAlert = page.locator('.ant-alert-error')
|
||||
await expect(errorAlert).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
// Should still be on the login page
|
||||
await expect(page).toHaveURL(/\/login/)
|
||||
|
||||
// The error message should mention invalid credentials
|
||||
const alertText = await errorAlert.textContent()
|
||||
expect(alertText?.toLowerCase()).toMatch(/invalid|无效|错误|incorrect|失败/)
|
||||
})
|
||||
|
||||
test('should redirect unauthenticated users to login', async ({ page }) => {
|
||||
// Clear any existing auth state, then try to visit a protected route
|
||||
await clearAuth(page)
|
||||
|
||||
await page.goto('/agent/chat')
|
||||
|
||||
// The router guard should redirect to /login?redirect=/agent/chat
|
||||
await expect(page).toHaveURL(/\/login/)
|
||||
await expect(page).toHaveURL(/redirect=/)
|
||||
|
||||
// The login form should be visible
|
||||
await expect(page.getByPlaceholder('请输入用户名')).toBeVisible()
|
||||
await expect(page.getByPlaceholder('请输入密码')).toBeVisible()
|
||||
})
|
||||
|
||||
test('should redirect to original page after login', async ({ page }) => {
|
||||
await clearAuth(page)
|
||||
|
||||
// Visit a protected route — should redirect to login with redirect param
|
||||
await page.goto('/agent/chat')
|
||||
await expect(page).toHaveURL(/\/login\?redirect=/)
|
||||
|
||||
// Now log in
|
||||
await page.getByPlaceholder('请输入用户名').fill(TEST_USER.username)
|
||||
await page.getByPlaceholder('请输入密码').fill(TEST_USER.password)
|
||||
await page.getByRole('button', { name: /登\s*录/ }).click()
|
||||
|
||||
// Should be redirected back to the originally requested page
|
||||
await expect(page).toHaveURL(/\/agent\/chat/)
|
||||
})
|
||||
})
|
||||
|
|
@ -0,0 +1,92 @@
|
|||
#!/usr/bin/env python3
|
||||
"""Create or update the E2E test admin user in the auth SQLite DB.
|
||||
|
||||
This script is invoked by Playwright's ``globalSetup`` (via ``global-setup.ts``)
|
||||
before any test runs. It ensures the auth DB schema exists and that a test
|
||||
admin user with known credentials is present.
|
||||
|
||||
The user credentials default to:
|
||||
username: e2e_test_admin
|
||||
password: E2eTestPass123!
|
||||
email: e2e-test@agentkit.local
|
||||
role: admin
|
||||
|
||||
Override via environment variables ``E2E_TEST_USERNAME``, ``E2E_TEST_PASSWORD``,
|
||||
``E2E_TEST_EMAIL`` if needed.
|
||||
|
||||
Exit codes:
|
||||
0 — user created or updated successfully
|
||||
1 — unexpected error
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import os
|
||||
import sys
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
|
||||
# Resolve project root so we can import agentkit regardless of CWD.
|
||||
# This file lives at src/agentkit/server/frontend/e2e/setup-test-user.py
|
||||
_PROJECT_ROOT = Path(__file__).resolve().parents[3]
|
||||
_SRC_ROOT = _PROJECT_ROOT / "src"
|
||||
if str(_SRC_ROOT) not in sys.path:
|
||||
sys.path.insert(0, str(_SRC_ROOT))
|
||||
|
||||
import aiosqlite # noqa: E402
|
||||
|
||||
from agentkit.server.auth.models import DEFAULT_AUTH_DB_PATH, init_auth_db # noqa: E402
|
||||
from agentkit.server.auth.password import hash_password # noqa: E402
|
||||
|
||||
TEST_USERNAME = os.environ.get("E2E_TEST_USERNAME", "e2e_test_admin")
|
||||
TEST_PASSWORD = os.environ.get("E2E_TEST_PASSWORD", "E2eTestPass123!")
|
||||
TEST_EMAIL = os.environ.get("E2E_TEST_EMAIL", "e2e-test@example.com")
|
||||
|
||||
|
||||
async def ensure_test_user() -> None:
|
||||
db_path = DEFAULT_AUTH_DB_PATH
|
||||
# Create schema (idempotent) — mirrors what /auth/login does on first hit.
|
||||
await init_auth_db(db_path)
|
||||
|
||||
password_hash = hash_password(TEST_PASSWORD)
|
||||
now_iso = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
async with aiosqlite.connect(str(db_path)) as db:
|
||||
cursor = await db.execute("SELECT id FROM users WHERE username = ?", (TEST_USERNAME,))
|
||||
existing = await cursor.fetchone()
|
||||
|
||||
if existing:
|
||||
# Update password + ensure admin role + terminal authorization
|
||||
await db.execute(
|
||||
"UPDATE users SET password_hash = ?, role = 'admin', is_active = 1, "
|
||||
"is_terminal_authorized = 1, is_server_terminal_authorized = 1, "
|
||||
"email = ?, updated_at = ? WHERE username = ?",
|
||||
(password_hash, TEST_EMAIL, now_iso, TEST_USERNAME),
|
||||
)
|
||||
await db.commit()
|
||||
print(f"[setup-test-user] Updated existing test user '{TEST_USERNAME}'")
|
||||
else:
|
||||
user_id = str(uuid.uuid4())
|
||||
await db.execute(
|
||||
"INSERT INTO users (id, username, email, password_hash, role, "
|
||||
"is_active, is_terminal_authorized, is_server_terminal_authorized, "
|
||||
"created_at, updated_at) VALUES (?, ?, ?, ?, 'admin', 1, 1, 1, ?, ?)",
|
||||
(user_id, TEST_USERNAME, TEST_EMAIL, password_hash, now_iso, now_iso),
|
||||
)
|
||||
await db.commit()
|
||||
print(f"[setup-test-user] Created test admin user '{TEST_USERNAME}'")
|
||||
|
||||
|
||||
def main() -> int:
|
||||
try:
|
||||
asyncio.run(ensure_test_user())
|
||||
return 0
|
||||
except Exception as exc: # noqa: BLE001
|
||||
print(f"[setup-test-user] ERROR: {exc}", file=sys.stderr)
|
||||
return 1
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
raise SystemExit(main())
|
||||
|
|
@ -0,0 +1,76 @@
|
|||
import { test, expect } from '@playwright/test'
|
||||
import { loginAndHydrate } from './helpers'
|
||||
|
||||
test.describe('Terminal panel', () => {
|
||||
test.beforeEach(async ({ page }) => {
|
||||
await loginAndHydrate(page)
|
||||
// The terminal view lives at /legacy/terminal (the /terminal route
|
||||
// redirects there — see router/index.ts).
|
||||
await page.goto('/legacy/terminal')
|
||||
})
|
||||
|
||||
test('should display the terminal panel with mode tabs', async ({ page }) => {
|
||||
// The TerminalPanel component renders .terminal-panel
|
||||
const terminalPanel = page.locator('.terminal-panel')
|
||||
await expect(terminalPanel).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
// The "本地终端" (local terminal) tab should always be visible
|
||||
await expect(
|
||||
terminalPanel.getByRole('button', { name: /本地终端/ }),
|
||||
).toBeVisible()
|
||||
|
||||
// The connection status indicator should be present
|
||||
await expect(terminalPanel.locator('.terminal-panel__indicator')).toBeVisible()
|
||||
})
|
||||
|
||||
test('should show server terminal tab for admin users', async ({ page }) => {
|
||||
// The test user is an admin, so the "服务端终端" tab should be visible
|
||||
// (it's gated behind authStore.canUseServerTerminal()).
|
||||
const terminalPanel = page.locator('.terminal-panel')
|
||||
await expect(terminalPanel).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
await expect(
|
||||
terminalPanel.getByRole('button', { name: /服务端终端/ }),
|
||||
).toBeVisible()
|
||||
})
|
||||
|
||||
test('should open the whitelist manager drawer', async ({ page }) => {
|
||||
// Wait for the terminal view to mount
|
||||
await expect(page.locator('.terminal-panel')).toBeVisible({ timeout: 10_000 })
|
||||
|
||||
// The whitelist button is positioned in the top-right corner of the
|
||||
// terminal view (SafetyOutlined icon inside .terminal-view__whitelist-btn).
|
||||
const whitelistBtn = page.locator('.terminal-view__whitelist-btn')
|
||||
await expect(whitelistBtn).toBeVisible()
|
||||
await whitelistBtn.click()
|
||||
|
||||
// The drawer should open and contain the WhitelistManager component.
|
||||
// The drawer title is "终端白名单管理".
|
||||
const drawer = page.locator('.ant-drawer-content')
|
||||
await expect(drawer).toBeVisible({ timeout: 5_000 })
|
||||
|
||||
// The WhitelistManager renders an a-tabs with "我的白名单" tab
|
||||
await expect(page.getByRole('tab', { name: '我的白名单' })).toBeVisible()
|
||||
|
||||
// The "添加" button and the input for new patterns should be visible.
|
||||
// Use regex to match possible Ant Design Vue auto-inserted space.
|
||||
await expect(
|
||||
drawer.getByPlaceholder('输入命令模式,如: git, npm, ls'),
|
||||
).toBeVisible()
|
||||
await expect(drawer.getByRole('button', { name: /添\s*加/ })).toBeVisible()
|
||||
})
|
||||
|
||||
test('should display admin-only tabs in whitelist manager', async ({ page }) => {
|
||||
// Open the whitelist drawer
|
||||
await expect(page.locator('.terminal-panel')).toBeVisible({ timeout: 10_000 })
|
||||
await page.locator('.terminal-view__whitelist-btn').click()
|
||||
|
||||
const drawer = page.locator('.ant-drawer-content')
|
||||
await expect(drawer).toBeVisible({ timeout: 5_000 })
|
||||
|
||||
// Admin users should see the "全局白名单", "黑名单", and "审计日志" tabs
|
||||
await expect(page.getByRole('tab', { name: '全局白名单' })).toBeVisible()
|
||||
await expect(page.getByRole('tab', { name: '黑名单' })).toBeVisible()
|
||||
await expect(page.getByRole('tab', { name: '审计日志' })).toBeVisible()
|
||||
})
|
||||
})
|
||||
|
|
@ -23,6 +23,7 @@
|
|||
"vue-router": "^4.4.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@playwright/test": "^1.59.0",
|
||||
"@tauri-apps/cli": "^2.11.2",
|
||||
"@types/dompurify": "^3.0.5",
|
||||
"@types/markdown-it": "^14.1.2",
|
||||
|
|
@ -579,6 +580,22 @@
|
|||
"@jridgewell/sourcemap-codec": "^1.4.14"
|
||||
}
|
||||
},
|
||||
"node_modules/@playwright/test": {
|
||||
"version": "1.61.0",
|
||||
"resolved": "https://registry.npmmirror.com/@playwright/test/-/test-1.61.0.tgz",
|
||||
"integrity": "sha512-cKA5B6lpFEMyMGjxF54QihfYpB4FkEGH+qZhtArDEG+wezQAJY8Pq6C7T1SjWz+FFzt3TbyoXBQYk/0292TdJA==",
|
||||
"dev": true,
|
||||
"license": "Apache-2.0",
|
||||
"dependencies": {
|
||||
"playwright": "1.61.0"
|
||||
},
|
||||
"bin": {
|
||||
"playwright": "cli.js"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18"
|
||||
}
|
||||
},
|
||||
"node_modules/@rollup/rollup-android-arm-eabi": {
|
||||
"version": "4.61.1",
|
||||
"resolved": "https://registry.npmmirror.com/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.61.1.tgz",
|
||||
|
|
@ -2220,6 +2237,53 @@
|
|||
"pathe": "^2.0.3"
|
||||
}
|
||||
},
|
||||
"node_modules/playwright": {
|
||||
"version": "1.61.0",
|
||||
"resolved": "https://registry.npmmirror.com/playwright/-/playwright-1.61.0.tgz",
|
||||
"integrity": "sha512-Z+7BeeqQPRRzklHsVFP4KTGIyMxKUmfeRA4WisM6G3/XW6nwGeX6fX9qYaDa+CiUqpOkb2f6X3nar05R3kSuJQ==",
|
||||
"dev": true,
|
||||
"license": "Apache-2.0",
|
||||
"dependencies": {
|
||||
"playwright-core": "1.61.0"
|
||||
},
|
||||
"bin": {
|
||||
"playwright": "cli.js"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18"
|
||||
},
|
||||
"optionalDependencies": {
|
||||
"fsevents": "2.3.2"
|
||||
}
|
||||
},
|
||||
"node_modules/playwright-core": {
|
||||
"version": "1.61.0",
|
||||
"resolved": "https://registry.npmmirror.com/playwright-core/-/playwright-core-1.61.0.tgz",
|
||||
"integrity": "sha512-caX7TrY3Ml6egyDX0WUcTHDxodl/b51y5wJOdCEA36QviK/s2g081hvmGs8eaE3DWb6NYZQ6BjO/QkNRPenoPA==",
|
||||
"dev": true,
|
||||
"license": "Apache-2.0",
|
||||
"bin": {
|
||||
"playwright-core": "cli.js"
|
||||
},
|
||||
"engines": {
|
||||
"node": ">=18"
|
||||
}
|
||||
},
|
||||
"node_modules/playwright/node_modules/fsevents": {
|
||||
"version": "2.3.2",
|
||||
"resolved": "https://registry.npmmirror.com/fsevents/-/fsevents-2.3.2.tgz",
|
||||
"integrity": "sha512-xiqMQR4xAeHTuB9uWm+fFRcIOgKBMiOBP+eXiyT7jsgVCq1bkVygt00oASowB7EdtpOHaaPgKt812P9ab+DDKA==",
|
||||
"dev": true,
|
||||
"hasInstallScript": true,
|
||||
"license": "MIT",
|
||||
"optional": true,
|
||||
"os": [
|
||||
"darwin"
|
||||
],
|
||||
"engines": {
|
||||
"node": "^8.16.0 || ^10.6.0 || >=11.0.0"
|
||||
}
|
||||
},
|
||||
"node_modules/postcss": {
|
||||
"version": "8.5.15",
|
||||
"resolved": "https://registry.npmmirror.com/postcss/-/postcss-8.5.15.tgz",
|
||||
|
|
|
|||
|
|
@ -9,7 +9,9 @@
|
|||
"build": "vue-tsc --noEmit && vite build",
|
||||
"build:frontend": "vue-tsc --noEmit && vite build",
|
||||
"preview": "vite preview",
|
||||
"tauri": "tauri"
|
||||
"tauri": "tauri",
|
||||
"test:e2e": "playwright test",
|
||||
"test:e2e:ui": "playwright test --ui"
|
||||
},
|
||||
"dependencies": {
|
||||
"@ant-design/icons-vue": "^7.0.0",
|
||||
|
|
@ -27,6 +29,7 @@
|
|||
"vue-router": "^4.4.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@playwright/test": "^1.59.0",
|
||||
"@tauri-apps/cli": "^2.11.2",
|
||||
"@types/dompurify": "^3.0.5",
|
||||
"@types/markdown-it": "^14.1.2",
|
||||
|
|
|
|||
|
|
@ -0,0 +1,80 @@
|
|||
import { defineConfig, devices } from '@playwright/test'
|
||||
|
||||
/**
|
||||
* Playwright E2E configuration for Fischer AgentKit frontend.
|
||||
*
|
||||
* Architecture:
|
||||
* - Backend (uvicorn direct, avoids agentkit serve interactive prompts) runs on
|
||||
* port 8000 to match the Vite dev-server proxy target in vite.config.ts.
|
||||
* - Frontend (Vite dev server) runs on port 5173 (strictPort in vite.config.ts).
|
||||
* - Tests target the frontend at http://localhost:5173; API/WS calls are
|
||||
* transparently proxied to the backend.
|
||||
*
|
||||
* The `globalSetup` script creates a test admin user in the auth DB before
|
||||
* any test runs, so login-based tests have valid credentials available.
|
||||
*/
|
||||
|
||||
// Project root relative to this config file
|
||||
// (src/agentkit/server/frontend/ → 4 levels up to project root)
|
||||
const PROJECT_ROOT = '../../../..'
|
||||
|
||||
export default defineConfig({
|
||||
testDir: './e2e',
|
||||
fullyParallel: false,
|
||||
forbidOnly: !!process.env.CI,
|
||||
retries: process.env.CI ? 1 : 0,
|
||||
workers: 1,
|
||||
reporter: [['list'], ['html', { open: 'never' }]],
|
||||
timeout: 90_000,
|
||||
expect: { timeout: 15_000 },
|
||||
globalSetup: './e2e/global-setup.ts',
|
||||
|
||||
use: {
|
||||
baseURL: 'http://localhost:5173',
|
||||
trace: 'on-first-retry',
|
||||
screenshot: 'only-on-failure',
|
||||
video: 'retain-on-failure',
|
||||
actionTimeout: 15_000,
|
||||
navigationTimeout: 30_000,
|
||||
},
|
||||
|
||||
projects: [
|
||||
{
|
||||
name: 'chromium',
|
||||
use: {
|
||||
...devices['Desktop Chrome'],
|
||||
// Use system Chrome to avoid slow browser downloads.
|
||||
channel: 'chrome',
|
||||
},
|
||||
},
|
||||
],
|
||||
|
||||
webServer: [
|
||||
{
|
||||
// Use uvicorn directly — `agentkit serve` has Confirm.ask() prompts
|
||||
// that fail in non-tty subprocess environments.
|
||||
// Env vars set inline to avoid Playwright's env property replacing
|
||||
// the entire process.env (which would lose PATH, API keys, etc.).
|
||||
command:
|
||||
'AGENTKIT_GUI_MODE=1 NO_PROXY=127.0.0.1,localhost no_proxy=127.0.0.1,localhost ' +
|
||||
'python3 -c "import uvicorn; uvicorn.run(' +
|
||||
"'agentkit.server.app:create_app', " +
|
||||
"host='127.0.0.1', port=8000, factory=True)\"",
|
||||
url: 'http://127.0.0.1:8000/api/v1/health',
|
||||
cwd: PROJECT_ROOT,
|
||||
reuseExistingServer: !process.env.CI,
|
||||
timeout: 120_000,
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
},
|
||||
{
|
||||
command: 'npm run dev',
|
||||
url: 'http://localhost:5173',
|
||||
cwd: '.',
|
||||
reuseExistingServer: !process.env.CI,
|
||||
timeout: 60_000,
|
||||
stdout: 'pipe',
|
||||
stderr: 'pipe',
|
||||
},
|
||||
],
|
||||
})
|
||||
|
|
@ -7,12 +7,7 @@ import type {
|
|||
IConversation,
|
||||
IChatRequest,
|
||||
WsClientMessage,
|
||||
IExpertTeamState,
|
||||
IBoardStartedData,
|
||||
IExpertSpeechData,
|
||||
IRoundSummaryData,
|
||||
IUserInterventionData,
|
||||
IBoardConcludedData,
|
||||
WsServerMessage,
|
||||
} from '@/api/types'
|
||||
|
||||
function generateId(): string {
|
||||
|
|
@ -276,7 +271,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
|
||||
socket.onmessage = (event: MessageEvent) => {
|
||||
try {
|
||||
const data = JSON.parse(event.data as string) as Record<string, unknown>
|
||||
const data = JSON.parse(event.data as string) as WsServerMessage
|
||||
console.log('[Chat WS] Received:', data.type, data)
|
||||
handleWsMessage(data)
|
||||
} catch (error) {
|
||||
|
|
@ -403,17 +398,14 @@ export const useChatStore = defineStore('chat', () => {
|
|||
return _teamStore
|
||||
}
|
||||
|
||||
// TODO: refactor to WsServerMessage union to eliminate `any`.
|
||||
// This function predates the current VI redesign and touches many legacy branches.
|
||||
function handleWsMessage(data: Record<string, any>): void {
|
||||
// Backend sends nested data: {type, data: {...}}
|
||||
// Flatten for easier access
|
||||
const payload = data.data ?? data
|
||||
|
||||
function handleWsMessage(data: WsServerMessage): void {
|
||||
// Discriminated union narrowing: each `case` branch narrows `data` to a
|
||||
// specific variant of WsServerMessage, so typed fields can be accessed
|
||||
// directly from `data` (or `data.data` for variants with a nested payload).
|
||||
switch (data.type) {
|
||||
case 'connected': {
|
||||
// Backend confirms conversation — update local ID if backend assigned a different one
|
||||
const serverConvId = data.conversation_id || payload.conversation_id
|
||||
const serverConvId = data.conversation_id
|
||||
if (serverConvId && serverConvId !== currentConversationId.value) {
|
||||
// Rename the local conversation to match the server ID
|
||||
const localId = currentConversationId.value
|
||||
|
|
@ -453,11 +445,12 @@ export const useChatStore = defineStore('chat', () => {
|
|||
const lastAssistantMsg = [...conv.messages]
|
||||
.reverse()
|
||||
.find((m) => m.role === 'assistant')
|
||||
const stepInfo = payload
|
||||
const stepInfo = data.data
|
||||
const innerData = stepInfo.data as Record<string, unknown>
|
||||
const desc = stepInfo.event_type === 'final_answer'
|
||||
? '生成最终回答'
|
||||
: stepInfo.event_type === 'tool_call'
|
||||
? `调用工具: ${stepInfo.data?.tool_name || stepInfo.data?.name || '#'}`
|
||||
? `调用工具: ${(innerData.tool_name || innerData.name || '#') as string}`
|
||||
: stepInfo.event_type === 'thinking'
|
||||
? '思考中...'
|
||||
: `步骤 ${stepInfo.step || ''}: ${stepInfo.event_type || ''}`
|
||||
|
|
@ -469,11 +462,11 @@ export const useChatStore = defineStore('chat', () => {
|
|||
|
||||
if (stepInfo.event_type === 'tool_call') {
|
||||
const tcId = `tc-${stepInfo.step || toolCalls.length}`
|
||||
const toolName = stepInfo.data?.tool_name || stepInfo.data?.name || 'unknown'
|
||||
const params = stepInfo.data?.arguments
|
||||
? (typeof stepInfo.data.arguments === 'string'
|
||||
? stepInfo.data.arguments
|
||||
: JSON.stringify(stepInfo.data.arguments, null, 2))
|
||||
const toolName = (innerData.tool_name || innerData.name || 'unknown') as string
|
||||
const params = innerData.arguments
|
||||
? (typeof innerData.arguments === 'string'
|
||||
? innerData.arguments
|
||||
: JSON.stringify(innerData.arguments, null, 2))
|
||||
: undefined
|
||||
toolCalls.push({
|
||||
id: tcId,
|
||||
|
|
@ -486,20 +479,20 @@ export const useChatStore = defineStore('chat', () => {
|
|||
// Find the last running tool call and update it
|
||||
const lastRunning = [...toolCalls].reverse().find(tc => tc.status === 'running')
|
||||
if (lastRunning) {
|
||||
const resultStr = stepInfo.data?.output
|
||||
? (typeof stepInfo.data.output === 'string'
|
||||
? stepInfo.data.output
|
||||
: JSON.stringify(stepInfo.data.output, null, 2))
|
||||
const resultStr = innerData.output
|
||||
? (typeof innerData.output === 'string'
|
||||
? innerData.output
|
||||
: JSON.stringify(innerData.output, null, 2))
|
||||
: ''
|
||||
lastRunning.status = stepInfo.data?.error ? 'error' : 'completed'
|
||||
lastRunning.status = innerData.error ? 'error' : 'completed'
|
||||
lastRunning.result = resultStr.length > 2000 ? resultStr.substring(0, 2000) + '...' : resultStr
|
||||
lastRunning.error = stepInfo.data?.error
|
||||
lastRunning.duration = stepInfo.data?.duration
|
||||
lastRunning.error = innerData.error as string | undefined
|
||||
lastRunning.duration = innerData.duration as number | undefined
|
||||
updateMessage(conversationId, lastAssistantMsg.id, { tool_calls: [...toolCalls] })
|
||||
}
|
||||
} else if (stepInfo.event_type === 'thinking') {
|
||||
// Accumulate thinking content for ThinkingBlock rendering
|
||||
const thinkingChunk = stepInfo.data?.content || stepInfo.data?.thought || ''
|
||||
const thinkingChunk = (innerData.content || innerData.thought || '') as string
|
||||
if (thinkingChunk && lastAssistantMsg) {
|
||||
updateMessage(conversationId, lastAssistantMsg.id, {
|
||||
thinking: (lastAssistantMsg.thinking || '') + thinkingChunk,
|
||||
|
|
@ -510,7 +503,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
|
||||
// Accumulate final_answer content for streaming display
|
||||
if (stepInfo.event_type === 'final_answer' && lastAssistantMsg) {
|
||||
const chunk = stepInfo.data?.output || ''
|
||||
const chunk = (innerData.output || '') as string
|
||||
if (chunk) {
|
||||
updateMessage(conversationId, lastAssistantMsg.id, {
|
||||
content: (lastAssistantMsg.content || '') + chunk,
|
||||
|
|
@ -529,7 +522,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
.reverse()
|
||||
.find((m) => m.role === 'assistant')
|
||||
// Backend sends: {type: "result", data: {message: "..."}} or {data: {status, content}}
|
||||
const content = payload.message || payload.content || ''
|
||||
const content = data.data.message || data.data.content || ''
|
||||
if (lastAssistantMsg) {
|
||||
// Only overwrite if we didn't already stream the content
|
||||
const finalContent = content || lastAssistantMsg.content || ''
|
||||
|
|
@ -562,7 +555,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
updateMessage(conversationId, lastAssistantMsg.id, {
|
||||
message_type: 'error',
|
||||
status: 'error',
|
||||
error_detail: payload.message || '未知错误',
|
||||
error_detail: data.data.message || '未知错误',
|
||||
content: lastAssistantMsg.content || '',
|
||||
})
|
||||
} else {
|
||||
|
|
@ -573,7 +566,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
timestamp: new Date().toISOString(),
|
||||
status: 'error',
|
||||
message_type: 'error',
|
||||
error_detail: payload.message || '未知错误',
|
||||
error_detail: data.data.message || '未知错误',
|
||||
}
|
||||
appendMessage(conversationId, errorMsg)
|
||||
}
|
||||
|
|
@ -585,9 +578,9 @@ export const useChatStore = defineStore('chat', () => {
|
|||
case 'team_formed': {
|
||||
const teamStore = _getTeamStore()
|
||||
if (teamStore) {
|
||||
teamStore.setTeamState(payload as IExpertTeamState)
|
||||
teamStore.setTeamState(data.data)
|
||||
}
|
||||
streamingSteps.value.push(`专家团队已组建: ${(payload as IExpertTeamState).experts.map((e) => e.name).join(', ')}`)
|
||||
streamingSteps.value.push(`专家团队已组建: ${data.data.experts.map((e) => e.name).join(', ')}`)
|
||||
break
|
||||
}
|
||||
|
||||
|
|
@ -599,26 +592,26 @@ export const useChatStore = defineStore('chat', () => {
|
|||
// Dedup: append to existing expert message if one exists for this expert
|
||||
const existingExpertMsg = [...conv.messages]
|
||||
.reverse()
|
||||
.find((m) => m.expert_id === payload.expert_id && m.status === 'pending')
|
||||
.find((m) => m.expert_id === data.data.expert_id && m.status === 'pending')
|
||||
if (existingExpertMsg) {
|
||||
updateMessage(conversationId, existingExpertMsg.id, {
|
||||
content: (existingExpertMsg.content || '') + (payload.content || ''),
|
||||
content: (existingExpertMsg.content || '') + (data.data.content || ''),
|
||||
})
|
||||
} else {
|
||||
const expertMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: payload.content || '',
|
||||
content: data.data.content || '',
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'pending',
|
||||
expert_id: payload.expert_id,
|
||||
expert_name: payload.expert_name,
|
||||
expert_color: payload.expert_color,
|
||||
expert_id: data.data.expert_id,
|
||||
expert_name: data.data.expert_name,
|
||||
expert_color: data.data.expert_color,
|
||||
message_type: 'chat',
|
||||
}
|
||||
appendMessage(conversationId, expertMsg)
|
||||
}
|
||||
streamingSteps.value.push(`${payload.expert_name}: 步骤 ${payload.step}`)
|
||||
streamingSteps.value.push(`${data.data.expert_name}: 步骤 ${data.data.step}`)
|
||||
break
|
||||
}
|
||||
|
||||
|
|
@ -630,12 +623,12 @@ export const useChatStore = defineStore('chat', () => {
|
|||
const expertMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: payload.content || '',
|
||||
content: data.data.content || '',
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
expert_id: payload.expert_id,
|
||||
expert_name: payload.expert_name,
|
||||
expert_color: payload.expert_color,
|
||||
expert_id: data.data.expert_id,
|
||||
expert_name: data.data.expert_name,
|
||||
expert_color: data.data.expert_color,
|
||||
message_type: 'chat',
|
||||
}
|
||||
appendMessage(conversationId, expertMsg)
|
||||
|
|
@ -645,7 +638,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
case 'plan_update': {
|
||||
const teamStore = _getTeamStore()
|
||||
if (teamStore) {
|
||||
teamStore.updatePhases(payload.plan_phases)
|
||||
teamStore.updatePhases(data.data.plan_phases)
|
||||
}
|
||||
const conversationId = currentConversationId.value
|
||||
if (!conversationId) break
|
||||
|
|
@ -656,7 +649,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
.find((m) => m.message_type === 'plan_update')
|
||||
if (existingPlanMsg) {
|
||||
updateMessage(conversationId, existingPlanMsg.id, {
|
||||
plan_phases: payload.plan_phases,
|
||||
plan_phases: data.data.plan_phases,
|
||||
})
|
||||
} else {
|
||||
const planMsg: IChatMessage = {
|
||||
|
|
@ -666,7 +659,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
message_type: 'plan_update',
|
||||
plan_phases: payload.plan_phases,
|
||||
plan_phases: data.data.plan_phases,
|
||||
}
|
||||
appendMessage(conversationId, planMsg)
|
||||
}
|
||||
|
|
@ -681,7 +674,7 @@ export const useChatStore = defineStore('chat', () => {
|
|||
const synthesisMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: payload.content || '',
|
||||
content: data.data.content || '',
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
message_type: 'milestone',
|
||||
|
|
@ -702,8 +695,8 @@ export const useChatStore = defineStore('chat', () => {
|
|||
case 'phase_started': {
|
||||
const teamStore = _getTeamStore()
|
||||
if (teamStore?.teamState) {
|
||||
teamStore.updatePhaseStatus(payload.phase_id, 'in_progress')
|
||||
streamingSteps.value.push(`阶段开始: ${payload.phase_name} (${payload.assigned_expert})`)
|
||||
teamStore.updatePhaseStatus(data.data.phase_id, 'in_progress')
|
||||
streamingSteps.value.push(`阶段开始: ${data.data.phase_name} (${data.data.assigned_expert})`)
|
||||
}
|
||||
break
|
||||
}
|
||||
|
|
@ -711,8 +704,8 @@ export const useChatStore = defineStore('chat', () => {
|
|||
case 'phase_completed': {
|
||||
const teamStore = _getTeamStore()
|
||||
if (teamStore?.teamState) {
|
||||
teamStore.updatePhaseStatus(payload.phase_id, 'completed', payload.result_summary)
|
||||
streamingSteps.value.push(`阶段完成: ${payload.phase_name}`)
|
||||
teamStore.updatePhaseStatus(data.data.phase_id, 'completed', data.data.result_summary)
|
||||
streamingSteps.value.push(`阶段完成: ${data.data.phase_name}`)
|
||||
}
|
||||
break
|
||||
}
|
||||
|
|
@ -720,8 +713,8 @@ export const useChatStore = defineStore('chat', () => {
|
|||
case 'phase_failed': {
|
||||
const teamStore = _getTeamStore()
|
||||
if (teamStore?.teamState) {
|
||||
teamStore.updatePhaseStatus(payload.phase_id, 'failed', payload.error)
|
||||
streamingSteps.value.push(`阶段失败: ${payload.phase_name} - ${payload.error}`)
|
||||
teamStore.updatePhaseStatus(data.data.phase_id, 'failed', data.data.error)
|
||||
streamingSteps.value.push(`阶段失败: ${data.data.phase_name} - ${data.data.error}`)
|
||||
}
|
||||
break
|
||||
}
|
||||
|
|
@ -729,23 +722,23 @@ export const useChatStore = defineStore('chat', () => {
|
|||
// ── Board Meeting 模式事件 ────────────────────────────────────────
|
||||
|
||||
case 'board_started': {
|
||||
const data = payload as IBoardStartedData
|
||||
const boardData = data.data
|
||||
// Initialize board state
|
||||
boardState.value = {
|
||||
topic: data.topic,
|
||||
experts: data.experts.map((e) => ({
|
||||
topic: boardData.topic,
|
||||
experts: boardData.experts.map((e) => ({
|
||||
name: e.name,
|
||||
avatar: e.avatar,
|
||||
color: e.color,
|
||||
is_moderator: e.is_moderator,
|
||||
persona: e.persona,
|
||||
})),
|
||||
max_rounds: data.max_rounds,
|
||||
max_rounds: boardData.max_rounds,
|
||||
current_round: 0,
|
||||
status: 'discussing',
|
||||
}
|
||||
streamingSteps.value.push(
|
||||
`私董会已开启: 主题「${data.topic}」, ${data.experts.length} 位专家, 最多 ${data.max_rounds} 轮`
|
||||
`私董会已开启: 主题「${boardData.topic}」, ${boardData.experts.length} 位专家, 最多 ${boardData.max_rounds} 轮`
|
||||
)
|
||||
// Push a structured banner message so the renderer can show BoardBannerCard
|
||||
const conversationId = currentConversationId.value
|
||||
|
|
@ -753,11 +746,11 @@ export const useChatStore = defineStore('chat', () => {
|
|||
const startMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: `🏛️ 私董会开始:${data.topic}`,
|
||||
content: `🏛️ 私董会开始:${boardData.topic}`,
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
message_type: 'board_started',
|
||||
board_started: data,
|
||||
board_started: boardData,
|
||||
board_round: 0,
|
||||
}
|
||||
appendMessage(conversationId, startMsg)
|
||||
|
|
@ -766,67 +759,67 @@ export const useChatStore = defineStore('chat', () => {
|
|||
}
|
||||
|
||||
case 'expert_speech': {
|
||||
const data = payload as IExpertSpeechData
|
||||
const speechData = data.data
|
||||
// Update current round in board state
|
||||
if (boardState.value && data.round > boardState.value.current_round) {
|
||||
boardState.value.current_round = data.round
|
||||
if (boardState.value && speechData.round > boardState.value.current_round) {
|
||||
boardState.value.current_round = speechData.round
|
||||
}
|
||||
const conversationId = currentConversationId.value
|
||||
if (!conversationId) break
|
||||
const speechMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: data.content || '',
|
||||
content: speechData.content || '',
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
expert_name: data.expert_name,
|
||||
expert_color: data.expert_color,
|
||||
expert_avatar: data.expert_avatar,
|
||||
expert_name: speechData.expert_name,
|
||||
expert_color: speechData.expert_color,
|
||||
expert_avatar: speechData.expert_avatar,
|
||||
message_type: 'board_speech',
|
||||
board_round: data.round,
|
||||
board_role: data.role,
|
||||
board_round: speechData.round,
|
||||
board_role: speechData.role,
|
||||
}
|
||||
appendMessage(conversationId, speechMsg)
|
||||
streamingSteps.value.push(
|
||||
`${data.expert_avatar} ${data.expert_name} (第${data.round}轮${data.role === 'moderator' ? '·主持' : ''})`
|
||||
`${speechData.expert_avatar} ${speechData.expert_name} (第${speechData.round}轮${speechData.role === 'moderator' ? '·主持' : ''})`
|
||||
)
|
||||
break
|
||||
}
|
||||
|
||||
case 'round_summary': {
|
||||
const data = payload as IRoundSummaryData
|
||||
const summaryData = data.data
|
||||
const conversationId = currentConversationId.value
|
||||
if (!conversationId) break
|
||||
const summaryMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: data.content || '',
|
||||
content: summaryData.content || '',
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
expert_name: data.moderator_name,
|
||||
expert_name: summaryData.moderator_name,
|
||||
message_type: 'board_summary',
|
||||
board_round: data.round,
|
||||
board_round: summaryData.round,
|
||||
board_role: 'summary',
|
||||
}
|
||||
appendMessage(conversationId, summaryMsg)
|
||||
streamingSteps.value.push(`第${data.round}轮小结${data.continue ? '(继续讨论)' : '(即将结束)'}`)
|
||||
streamingSteps.value.push(`第${summaryData.round}轮小结${summaryData.continue ? '(继续讨论)' : '(即将结束)'}`)
|
||||
break
|
||||
}
|
||||
|
||||
case 'user_intervention': {
|
||||
const data = payload as IUserInterventionData
|
||||
streamingSteps.value.push(`用户干预: ${data.content.slice(0, 50)}...`)
|
||||
const interventionData = data.data
|
||||
streamingSteps.value.push(`用户干预: ${interventionData.content.slice(0, 50)}...`)
|
||||
break
|
||||
}
|
||||
|
||||
case 'board_concluded': {
|
||||
const data = payload as IBoardConcludedData
|
||||
const conclusionData = data.data
|
||||
// Update board state to completed
|
||||
if (boardState.value) {
|
||||
boardState.value.status = 'completed'
|
||||
}
|
||||
streamingSteps.value.push(
|
||||
`私董会结束: ${data.total_rounds} 轮讨论${data.error ? ' (异常)' : ''}`
|
||||
`私董会结束: ${conclusionData.total_rounds} 轮讨论${conclusionData.error ? ' (异常)' : ''}`
|
||||
)
|
||||
// Push a structured conclusion message so the renderer can show BoardConclusionCard
|
||||
const conversationId = currentConversationId.value
|
||||
|
|
@ -834,12 +827,12 @@ export const useChatStore = defineStore('chat', () => {
|
|||
const conclusionMsg: IChatMessage = {
|
||||
id: generateId(),
|
||||
role: 'assistant',
|
||||
content: data.summary || '私董会已结束',
|
||||
content: conclusionData.summary || '私董会已结束',
|
||||
timestamp: new Date().toISOString(),
|
||||
status: 'completed',
|
||||
message_type: 'board_conclusion',
|
||||
board_conclusion: data,
|
||||
board_round: data.total_rounds,
|
||||
board_conclusion: conclusionData,
|
||||
board_round: conclusionData.total_rounds,
|
||||
}
|
||||
appendMessage(conversationId, conclusionMsg)
|
||||
}
|
||||
|
|
|
|||
|
|
@ -135,7 +135,15 @@ async def submit_task(request: SubmitTaskRequest, req: Request):
|
|||
quality_result = None
|
||||
if skill:
|
||||
try:
|
||||
quality_result = await quality_gate.validate(task_result.output_data or {}, skill)
|
||||
intent = getattr(skill.config, "intent", None)
|
||||
skill_context = None
|
||||
if intent is not None:
|
||||
keywords = list(intent.keywords) + list(intent.disambiguation_keywords)
|
||||
if keywords:
|
||||
skill_context = {"intent_keywords": keywords}
|
||||
quality_result = await quality_gate.validate(
|
||||
task_result.output_data or {}, skill, skill_context=skill_context
|
||||
)
|
||||
except Exception:
|
||||
pass # Quality gate failure shouldn't block the response
|
||||
|
||||
|
|
|
|||
|
|
@ -110,8 +110,18 @@ class BackgroundRunner:
|
|||
quality_result = None
|
||||
if skill and quality_gate:
|
||||
try:
|
||||
intent = getattr(skill.config, "intent", None)
|
||||
skill_context = None
|
||||
if intent is not None:
|
||||
keywords = list(intent.keywords) + list(
|
||||
intent.disambiguation_keywords
|
||||
)
|
||||
if keywords:
|
||||
skill_context = {"intent_keywords": keywords}
|
||||
quality_result = await quality_gate.validate(
|
||||
task_result.output_data or {}, skill
|
||||
task_result.output_data or {},
|
||||
skill,
|
||||
skill_context=skill_context,
|
||||
)
|
||||
except Exception as e:
|
||||
logger.warning(f"Quality gate failed for {task_id}: {e}")
|
||||
|
|
|
|||
|
|
@ -36,6 +36,7 @@ class IntentConfig:
|
|||
keywords: list[str] = field(default_factory=list)
|
||||
description: str = ""
|
||||
examples: list[str] = field(default_factory=list)
|
||||
disambiguation_keywords: list[str] = field(default_factory=list)
|
||||
|
||||
|
||||
@dataclass
|
||||
|
|
@ -214,6 +215,7 @@ class SkillConfig(AgentConfig):
|
|||
"keywords": self.intent.keywords,
|
||||
"description": self.intent.description,
|
||||
"examples": self.intent.examples,
|
||||
"disambiguation_keywords": self.intent.disambiguation_keywords,
|
||||
}
|
||||
d["quality_gate"] = {
|
||||
"required_fields": self.quality_gate.required_fields,
|
||||
|
|
|
|||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
|
|
@ -1,11 +1,11 @@
|
|||
# AgentKit 能力基准测试报告
|
||||
|
||||
## 测试概要
|
||||
- 时间: 2026-06-17T15:47:33.591101+00:00
|
||||
- 时间: 2026-06-20T11:05:39.446588+00:00
|
||||
- 版本: 0.1.0
|
||||
- 模式: mock
|
||||
- 运行次数: 1
|
||||
- 总体准确率: 100.0% ± 0.0%
|
||||
- 模式: llm
|
||||
- 运行次数: 3
|
||||
- 总体准确率: 93.3% ± 0.0%
|
||||
|
||||
## 与行业 Benchmark 对比
|
||||
|
||||
|
|
@ -17,252 +17,46 @@
|
|||
|
||||
## 维度结果
|
||||
|
||||
### 1. 预处理准确度 (Preprocessing Accuracy) [Mock]
|
||||
### 9. LLM 推理能力 (LLM Reasoning) [LLM]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [79.6%, 100.0%] |
|
||||
| Precision | 100.0% |
|
||||
| Recall | 100.0% |
|
||||
| F1 | 100.0% |
|
||||
| Latency p50 | 0.01ms |
|
||||
| Latency p95 | 0.07ms |
|
||||
| Latency p99 | 0.11ms |
|
||||
| Accuracy | 93.3% ± 9.4% |
|
||||
| 95% CI | [37.5%, 96.4%] |
|
||||
| Precision | 0.0% |
|
||||
| Recall | 0.0% |
|
||||
| F1 | 0.0% |
|
||||
| Latency p50 | 40798.45ms |
|
||||
| Latency p95 | 56307.93ms |
|
||||
| Latency p99 | 59262.53ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 15 / 15 / 0 |
|
||||
| Total / Pass / Fail | 5 / 4 / 1 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| greeting | 4 | 4 | 100.0% |
|
||||
| tool_query | 5 | 5 | 100.0% |
|
||||
| skill_prefix | 3 | 3 | 100.0% |
|
||||
| complex | 3 | 3 | 100.0% |
|
||||
| intent_understanding | 1 | 1 | 100.0% |
|
||||
| tool_selection | 1 | 0 | 0.0% |
|
||||
| multi_step | 1 | 1 | 100.0% |
|
||||
| code_generation | 1 | 1 | 100.0% |
|
||||
| error_recovery | 1 | 1 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 5 | 5 | 100.0% |
|
||||
| medium | 7 | 7 | 100.0% |
|
||||
| hard | 3 | 3 | 100.0% |
|
||||
|
||||
### 2. 过拟合检测 (Overfitting Detection) [Mock]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [56.5%, 100.0%] |
|
||||
| Precision | 100.0% |
|
||||
| Recall | 100.0% |
|
||||
| F1 | 100.0% |
|
||||
| Latency p50 | 0.01ms |
|
||||
| Latency p95 | 0.03ms |
|
||||
| Latency p99 | 0.03ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 5 / 5 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| ip_check | 1 | 1 | 100.0% |
|
||||
| search | 1 | 1 | 100.0% |
|
||||
| greeting | 1 | 1 | 100.0% |
|
||||
| tool_use | 1 | 1 | 100.0% |
|
||||
| complex | 1 | 1 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| medium | 3 | 3 | 100.0% |
|
||||
| easy | 1 | 1 | 100.0% |
|
||||
| hard | 1 | 1 | 100.0% |
|
||||
| medium | 2 | 1 | 50.0% |
|
||||
| hard | 2 | 2 | 100.0% |
|
||||
|
||||
### 3. 效率测试 (Efficiency) [Mock]
|
||||
#### 失败用例分析
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [56.5%, 100.0%] |
|
||||
| Precision | 0.0% |
|
||||
| Recall | 0.0% |
|
||||
| F1 | 0.0% |
|
||||
| Latency p50 | 0.33ms |
|
||||
| Latency p95 | 0.64ms |
|
||||
| Latency p99 | 0.67ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 5 / 5 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| preprocess_latency | 3 | 3 | 100.0% |
|
||||
| tool_search_latency | 2 | 2 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 2 | 2 | 100.0% |
|
||||
| medium | 3 | 3 | 100.0% |
|
||||
|
||||
### 4. 工具搜索 (Tool Search) [Mock]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [72.2%, 100.0%] |
|
||||
| Precision | 83.3% |
|
||||
| Recall | 83.3% |
|
||||
| F1 | 83.3% |
|
||||
| Latency p50 | 0.01ms |
|
||||
| Latency p95 | 0.02ms |
|
||||
| Latency p99 | 0.02ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 10 / 10 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| exact_match | 5 | 5 | 100.0% |
|
||||
| fuzzy_match | 2 | 2 | 100.0% |
|
||||
| no_match | 2 | 2 | 100.0% |
|
||||
| top_k | 1 | 1 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 7 | 7 | 100.0% |
|
||||
| medium | 3 | 3 | 100.0% |
|
||||
|
||||
### 5. 事件模型 (Event Model) [Mock]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [61.0%, 100.0%] |
|
||||
| Precision | 0.0% |
|
||||
| Recall | 0.0% |
|
||||
| F1 | 0.0% |
|
||||
| Latency p50 | 0.05ms |
|
||||
| Latency p95 | 15.87ms |
|
||||
| Latency p99 | 20.08ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 6 / 6 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| sq_lifecycle | 3 | 3 | 100.0% |
|
||||
| eq_lifecycle | 3 | 3 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 6 | 6 | 100.0% |
|
||||
|
||||
### 6. 规格管理 (Spec Management) [Mock]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [64.6%, 100.0%] |
|
||||
| Precision | 0.0% |
|
||||
| Recall | 0.0% |
|
||||
| F1 | 0.0% |
|
||||
| Latency p50 | 1.94ms |
|
||||
| Latency p95 | 2.94ms |
|
||||
| Latency p99 | 3.25ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 7 / 7 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| crud | 5 | 5 | 100.0% |
|
||||
| edge | 2 | 2 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 6 | 6 | 100.0% |
|
||||
| medium | 1 | 1 | 100.0% |
|
||||
|
||||
### 7. 验证循环 (Verification Loop) [Mock]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [56.5%, 100.0%] |
|
||||
| Precision | 0.0% |
|
||||
| Recall | 0.0% |
|
||||
| F1 | 0.0% |
|
||||
| Latency p50 | 22.22ms |
|
||||
| Latency p95 | 47.79ms |
|
||||
| Latency p99 | 50.93ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 5 / 5 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| basic | 2 | 2 | 100.0% |
|
||||
| retry | 1 | 1 | 100.0% |
|
||||
| timeout | 1 | 1 | 100.0% |
|
||||
| multi | 1 | 1 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 2 | 2 | 100.0% |
|
||||
| medium | 3 | 3 | 100.0% |
|
||||
|
||||
### 8. 私董会路由 (Board Meeting Routing) [Mock]
|
||||
|
||||
| 指标 | 值 |
|
||||
|---|---|
|
||||
| Accuracy | 100.0% ± 0.0% |
|
||||
| 95% CI | [82.4%, 100.0%] |
|
||||
| Precision | 100.0% |
|
||||
| Recall | 100.0% |
|
||||
| F1 | 100.0% |
|
||||
| Latency p50 | 0.01ms |
|
||||
| Latency p95 | 0.39ms |
|
||||
| Latency p99 | 1.19ms |
|
||||
| Consistency | 100.0% |
|
||||
| Total / Pass / Fail | 18 / 18 / 0 |
|
||||
|
||||
#### 按类别分布
|
||||
|
||||
| 类别 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| default_template | 3 | 3 | 100.0% |
|
||||
| explicit_experts | 3 | 3 | 100.0% |
|
||||
| topic_extraction | 3 | 3 | 100.0% |
|
||||
| no_match | 3 | 3 | 100.0% |
|
||||
| name_validation | 3 | 3 | 100.0% |
|
||||
| stop_command | 3 | 3 | 100.0% |
|
||||
|
||||
#### 按难度分布
|
||||
|
||||
| 难度 | 用例数 | 通过 | 准确率 |
|
||||
|---|---|---|---|
|
||||
| easy | 11 | 11 | 100.0% |
|
||||
| medium | 7 | 7 | 100.0% |
|
||||
| 用例 ID | 类别 | 难度 | 期望 | 实际 | 根因 |
|
||||
|---|---|---|---|---|---|
|
||||
| llm-002 | tool_selection | medium | react | timeout | timeout |
|
||||
|
||||
## 问题总结与改进建议
|
||||
|
||||
- 所有维度表现良好,无需特别改进。
|
||||
- **llm_reasoning**: 准确率 80.0% 低于 90%,建议检查失败用例并优化
|
||||
- **llm_reasoning**: P95 延迟 56307.93ms 较高,建议优化性能
|
||||
|
|
|
|||
|
|
@ -0,0 +1,640 @@
|
|||
"""Real LLM E2E tests — tests against a live server with real LLM providers.
|
||||
|
||||
These tests start a real AgentKit server using the project's ``agentkit.yaml``
|
||||
configuration and make actual LLM API calls to Bailian (DashScope).
|
||||
|
||||
Requirements:
|
||||
- ``DASHSCOPE_API_KEY`` environment variable (loaded from ``.env``)
|
||||
- Network access to ``https://coding.dashscope.aliyuncs.com/v1``
|
||||
|
||||
Run with::
|
||||
|
||||
.venv/bin/python -m pytest tests/e2e/test_real_llm_e2e.py -v --timeout=180
|
||||
|
||||
All tests are marked with ``@pytest.mark.integration`` so they are excluded
|
||||
from the default unit-test run (``pytest -m "not integration"``).
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
import os
|
||||
import subprocess
|
||||
import sys
|
||||
import time
|
||||
import uuid
|
||||
from datetime import datetime, timezone
|
||||
from pathlib import Path
|
||||
from typing import Any, Generator
|
||||
|
||||
import aiosqlite
|
||||
import httpx
|
||||
import pytest
|
||||
|
||||
# Disable HTTP proxies for localhost requests (Clash/V2Ray intercepts localhost).
|
||||
os.environ["NO_PROXY"] = "127.0.0.1,localhost"
|
||||
os.environ["no_proxy"] = "127.0.0.1,localhost"
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Constants
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
PROJECT_ROOT = Path(__file__).resolve().parents[2]
|
||||
|
||||
REAL_LLM_HOST = "127.0.0.1"
|
||||
REAL_LLM_PORT = 18766 # dedicated port to avoid conflict with mock E2E (18765)
|
||||
REAL_LLM_BASE_URL = f"http://{REAL_LLM_HOST}:{REAL_LLM_PORT}"
|
||||
REAL_LLM_WS_URL = f"ws://{REAL_LLM_HOST}:{REAL_LLM_PORT}"
|
||||
|
||||
# Fixed JWT secret so tokens are deterministic across the session.
|
||||
TEST_JWT_SECRET = "test-jwt-secret-for-real-llm-e2e-fixed-do-not-use-in-prod"
|
||||
|
||||
# Test user credentials (created directly in the auth DB).
|
||||
TEST_USERNAME = "real_llm_e2e_user"
|
||||
TEST_PASSWORD = "TestPassword123!@#"
|
||||
TEST_EMAIL = "real_llm_e2e@example.com"
|
||||
|
||||
# Model alias from agentkit.yaml (resolves to bailian-coding/qwen3.7-plus).
|
||||
TEST_MODEL = "default"
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# .env loading
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _load_dotenv_vars(dotenv_path: Path) -> dict[str, str]:
|
||||
"""Load env vars from a .env file into a dict (does not touch os.environ)."""
|
||||
env_vars: dict[str, str] = {}
|
||||
if not dotenv_path.exists():
|
||||
return env_vars
|
||||
with open(dotenv_path, encoding="utf-8") as f:
|
||||
for line in f:
|
||||
line = line.strip()
|
||||
if not line or line.startswith("#"):
|
||||
continue
|
||||
if "=" not in line:
|
||||
continue
|
||||
key, _, value = line.partition("=")
|
||||
key = key.strip()
|
||||
value = value.strip().strip("\"'")
|
||||
if key:
|
||||
env_vars[key] = value
|
||||
return env_vars
|
||||
|
||||
|
||||
def _has_dashscope_key() -> bool:
|
||||
"""Return True if DASHSCOPE_API_KEY is available (env or .env file)."""
|
||||
if os.environ.get("DASHSCOPE_API_KEY"):
|
||||
return True
|
||||
dotenv_vars = _load_dotenv_vars(PROJECT_ROOT / ".env")
|
||||
return bool(dotenv_vars.get("DASHSCOPE_API_KEY"))
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Test user creation
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
def _create_test_user(auth_db_path: Path) -> None:
|
||||
"""Create the test user directly in the SQLite auth DB.
|
||||
|
||||
Uses bcrypt hashing (rounds=12) via the project's password utility so the
|
||||
``/auth/login`` route can verify the password.
|
||||
"""
|
||||
from agentkit.server.auth.models import init_auth_db
|
||||
from agentkit.server.auth.password import hash_password
|
||||
|
||||
# Ensure the schema exists.
|
||||
asyncio.run(init_auth_db(auth_db_path))
|
||||
|
||||
user_id = str(uuid.uuid4())
|
||||
password_hash = hash_password(TEST_PASSWORD)
|
||||
now_iso = datetime.now(timezone.utc).isoformat()
|
||||
|
||||
async def _insert() -> None:
|
||||
async with aiosqlite.connect(str(auth_db_path)) as db:
|
||||
# Remove any stale row from a previous run.
|
||||
await db.execute("DELETE FROM users WHERE username = ?", (TEST_USERNAME,))
|
||||
await db.execute(
|
||||
"INSERT INTO users "
|
||||
"(id, username, email, password_hash, role, is_active, "
|
||||
" is_terminal_authorized, is_server_terminal_authorized, "
|
||||
" created_at, updated_at) "
|
||||
"VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)",
|
||||
(
|
||||
user_id,
|
||||
TEST_USERNAME,
|
||||
TEST_EMAIL,
|
||||
password_hash,
|
||||
"admin", # admin role → full access for tests
|
||||
1,
|
||||
1,
|
||||
1,
|
||||
now_iso,
|
||||
now_iso,
|
||||
),
|
||||
)
|
||||
await db.commit()
|
||||
|
||||
asyncio.run(_insert())
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Session-scoped server fixture
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def real_llm_server(
|
||||
tmp_path_factory: pytest.TempPathFactory,
|
||||
) -> Generator[tuple[str, Path], None, None]:
|
||||
"""Start a real AgentKit server with actual LLM providers.
|
||||
|
||||
Yields ``(base_url, auth_db_path)``. The server uses the project root's
|
||||
``agentkit.yaml`` (Bailian coding plan) — no mock providers.
|
||||
|
||||
Skips the entire session if ``DASHSCOPE_API_KEY`` is not available.
|
||||
"""
|
||||
if not _has_dashscope_key():
|
||||
pytest.skip("DASHSCOPE_API_KEY not set — skipping real LLM E2E tests")
|
||||
|
||||
tmp_path = tmp_path_factory.mktemp("real_llm_server")
|
||||
auth_db_path = tmp_path / "auth.db"
|
||||
|
||||
# Build subprocess environment.
|
||||
env = os.environ.copy()
|
||||
|
||||
# Disable HTTP proxies so localhost requests don't go through Clash/V2Ray.
|
||||
for proxy_var in ("HTTP_PROXY", "HTTPS_PROXY", "http_proxy", "https_proxy", "ALL_PROXY", "all_proxy"):
|
||||
env.pop(proxy_var, None)
|
||||
env["NO_PROXY"] = "127.0.0.1,localhost"
|
||||
env["no_proxy"] = "127.0.0.1,localhost"
|
||||
|
||||
# Ensure API keys from .env are available to the subprocess.
|
||||
dotenv_vars = _load_dotenv_vars(PROJECT_ROOT / ".env")
|
||||
for key, value in dotenv_vars.items():
|
||||
if not env.get(key):
|
||||
env[key] = value
|
||||
|
||||
# Auth configuration.
|
||||
env["AGENTKIT_JWT_SECRET"] = TEST_JWT_SECRET
|
||||
env["AGENTKIT_AUTH_DB"] = str(auth_db_path)
|
||||
|
||||
# GUI mode creates a default chat agent (needed for chat / WebSocket tests).
|
||||
env["AGENTKIT_GUI_MODE"] = "1"
|
||||
|
||||
# Explicit config path (also auto-discovered via CWD, but set explicitly).
|
||||
config_path = PROJECT_ROOT / "agentkit.yaml"
|
||||
env["AGENTKIT_CONFIG_PATH"] = str(config_path)
|
||||
|
||||
# Start the server via uvicorn directly (agentkit serve has interactive
|
||||
# prompts that fail in non-tty subprocess environments).
|
||||
# Redirect stderr to a file so we can read server logs on test failures.
|
||||
stderr_log = tmp_path / "server_stderr.log"
|
||||
stderr_fh = open(stderr_log, "w", encoding="utf-8")
|
||||
try:
|
||||
proc = subprocess.Popen(
|
||||
[
|
||||
sys.executable,
|
||||
"-c",
|
||||
"import uvicorn; uvicorn.run("
|
||||
"'agentkit.server.app:create_app', "
|
||||
f"host='{REAL_LLM_HOST}', port={REAL_LLM_PORT}, factory=True)",
|
||||
],
|
||||
env=env,
|
||||
stdout=subprocess.PIPE,
|
||||
stderr=stderr_fh,
|
||||
cwd=str(PROJECT_ROOT),
|
||||
)
|
||||
|
||||
# Wait for the server to become healthy (max 60s — real LLM server
|
||||
# initialization is slower than the mock E2E server).
|
||||
base_url = REAL_LLM_BASE_URL
|
||||
deadline = time.monotonic() + 60
|
||||
ready = False
|
||||
while time.monotonic() < deadline:
|
||||
if proc.poll() is not None:
|
||||
# Process exited early — capture output for diagnostics.
|
||||
stdout, stderr = proc.communicate(timeout=5)
|
||||
pytest.fail(
|
||||
"Real LLM server exited early.\n"
|
||||
f"stdout: {stdout.decode()[:2000] if stdout else ''}\n"
|
||||
f"stderr: {stderr.decode()[:2000] if stderr else ''}"
|
||||
)
|
||||
try:
|
||||
resp = httpx.get(f"{base_url}/api/v1/health", timeout=2)
|
||||
if resp.status_code == 200:
|
||||
ready = True
|
||||
break
|
||||
except httpx.ConnectError:
|
||||
pass
|
||||
time.sleep(0.5)
|
||||
|
||||
if not ready:
|
||||
proc.terminate()
|
||||
try:
|
||||
stdout, stderr = proc.communicate(timeout=5)
|
||||
except subprocess.TimeoutExpired:
|
||||
proc.kill()
|
||||
stdout, stderr = proc.communicate()
|
||||
pytest.fail(
|
||||
"Real LLM server failed to start within 60s.\n"
|
||||
f"stdout: {stdout.decode()[:2000] if stdout else ''}\n"
|
||||
f"stderr: {stderr.decode()[:2000] if stderr else ''}"
|
||||
)
|
||||
|
||||
# Create the test user now that the server (and auth DB schema) is up.
|
||||
_create_test_user(auth_db_path)
|
||||
|
||||
yield base_url, auth_db_path
|
||||
|
||||
# Teardown — terminate the server process.
|
||||
proc.terminate()
|
||||
try:
|
||||
proc.wait(timeout=10)
|
||||
except subprocess.TimeoutExpired:
|
||||
proc.kill()
|
||||
proc.wait()
|
||||
finally:
|
||||
stderr_fh.close()
|
||||
|
||||
# If the server logged any errors, print them for debugging.
|
||||
if stderr_log.exists():
|
||||
log_content = stderr_log.read_text(encoding="utf-8", errors="replace")
|
||||
if "Error" in log_content or "Traceback" in log_content:
|
||||
print(f"\n--- Server stderr log ---\n{log_content[-3000:]}\n--- End server log ---")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Convenience fixtures derived from real_llm_server
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def base_url(real_llm_server: tuple[str, Path]) -> str:
|
||||
return real_llm_server[0]
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def auth_db_path(real_llm_server: tuple[str, Path]) -> Path:
|
||||
return real_llm_server[1]
|
||||
|
||||
|
||||
def _login_with_retry(
|
||||
base_url: str, max_retries: int = 3, delay: float = 1.0
|
||||
) -> httpx.Response:
|
||||
"""Login with retry on 500 (transient SQLite write-lock contention)."""
|
||||
if max_retries <= 0:
|
||||
raise ValueError("max_retries must be > 0")
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
for attempt in range(max_retries):
|
||||
resp = client.post(
|
||||
"/api/v1/auth/login",
|
||||
json={"username": TEST_USERNAME, "password": TEST_PASSWORD},
|
||||
)
|
||||
if resp.status_code == 200:
|
||||
return resp
|
||||
if resp.status_code == 500 and attempt < max_retries - 1:
|
||||
time.sleep(delay)
|
||||
continue
|
||||
return resp
|
||||
raise RuntimeError("unreachable: loop should have returned")
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def auth_token(base_url: str) -> str:
|
||||
"""Log in once per session and return the access token."""
|
||||
resp = _login_with_retry(base_url)
|
||||
assert resp.status_code == 200, (
|
||||
f"Login failed: {resp.status_code} {resp.text[:1000]}"
|
||||
)
|
||||
data = resp.json()
|
||||
assert "access_token" in data
|
||||
return data["access_token"]
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def refresh_token(base_url: str) -> str:
|
||||
"""Log in once per session and return the refresh token."""
|
||||
resp = _login_with_retry(base_url)
|
||||
assert resp.status_code == 200, (
|
||||
f"Login failed: {resp.status_code} {resp.text[:1000]}"
|
||||
)
|
||||
return resp.json()["refresh_token"]
|
||||
|
||||
|
||||
@pytest.fixture(scope="session")
|
||||
def auth_headers(auth_token: str) -> dict[str, str]:
|
||||
"""Default headers with a Bearer JWT for authenticated requests."""
|
||||
return {"Authorization": f"Bearer {auth_token}", "Content-Type": "application/json"}
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 1. Authentication Flow Tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.integration
|
||||
@pytest.mark.timeout(30)
|
||||
class TestAuthFlow:
|
||||
"""Verify the JWT authentication flow against the live server."""
|
||||
|
||||
def test_login_success(self, base_url: str):
|
||||
"""POST /auth/login with correct credentials returns a JWT pair."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/auth/login",
|
||||
json={"username": TEST_USERNAME, "password": TEST_PASSWORD},
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert "access_token" in data
|
||||
assert "refresh_token" in data
|
||||
assert data["token_type"] == "bearer"
|
||||
assert data["user"]["username"] == TEST_USERNAME
|
||||
assert data["user"]["role"] == "admin"
|
||||
|
||||
def test_login_wrong_password(self, base_url: str):
|
||||
"""POST /auth/login with wrong password returns 401."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/auth/login",
|
||||
json={"username": TEST_USERNAME, "password": "definitely-wrong"},
|
||||
)
|
||||
assert resp.status_code == 401
|
||||
|
||||
def test_me_with_valid_token(self, base_url: str, auth_headers: dict[str, str]):
|
||||
"""GET /auth/me with a valid JWT returns the user profile."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.get("/api/v1/auth/me", headers=auth_headers)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["username"] == TEST_USERNAME
|
||||
assert data["email"] == TEST_EMAIL
|
||||
assert data["role"] == "admin"
|
||||
assert data["is_active"] is True
|
||||
|
||||
def test_me_without_token_returns_401(self, base_url: str):
|
||||
"""GET /auth/me without a token returns 401."""
|
||||
with httpx.Client(base_url=base_url, timeout=10) as client:
|
||||
resp = client.get("/api/v1/auth/me")
|
||||
assert resp.status_code == 401
|
||||
|
||||
def test_refresh_token(self, base_url: str, refresh_token: str):
|
||||
"""POST /auth/refresh exchanges a refresh token for a new access token."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/auth/refresh",
|
||||
json={"refresh_token": refresh_token},
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert "access_token" in data
|
||||
assert data["user"]["username"] == TEST_USERNAME
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 2. LLM Gateway Tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.integration
|
||||
@pytest.mark.timeout(120)
|
||||
class TestLLMGateway:
|
||||
"""Verify the LLM gateway proxy returns real LLM responses."""
|
||||
|
||||
def test_chat_non_streaming(self, base_url: str, auth_headers: dict[str, str]):
|
||||
"""POST /llm/chat returns a non-empty real LLM response."""
|
||||
with httpx.Client(base_url=base_url, timeout=90) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/llm/chat",
|
||||
headers=auth_headers,
|
||||
json={
|
||||
"messages": [{"role": "user", "content": "你好,请用一句话介绍自己"}],
|
||||
"model": TEST_MODEL,
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 200,
|
||||
},
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert "content" in data
|
||||
content: str = data["content"]
|
||||
assert len(content) > 0
|
||||
# Real LLM response should contain Chinese characters.
|
||||
assert any("\u4e00" <= ch <= "\u9fff" for ch in content)
|
||||
assert "model" in data
|
||||
assert "usage" in data
|
||||
|
||||
def test_chat_streaming_sse(self, base_url: str, auth_headers: dict[str, str]):
|
||||
"""POST /llm/chat/stream returns SSE chunks with real content."""
|
||||
chunks: list[dict[str, Any]] = []
|
||||
with httpx.Client(base_url=base_url, timeout=90) as client:
|
||||
with client.stream(
|
||||
"POST",
|
||||
"/api/v1/llm/chat/stream",
|
||||
headers=auth_headers,
|
||||
json={
|
||||
"messages": [{"role": "user", "content": "用一句话说明什么是人工智能"}],
|
||||
"model": TEST_MODEL,
|
||||
"temperature": 0.7,
|
||||
"max_tokens": 200,
|
||||
},
|
||||
) as resp:
|
||||
assert resp.status_code == 200
|
||||
for line in resp.iter_lines():
|
||||
if not line.startswith("data: "):
|
||||
continue
|
||||
payload = line[6:]
|
||||
if payload == "[DONE]":
|
||||
break
|
||||
chunks.append(json.loads(payload))
|
||||
|
||||
assert len(chunks) > 0
|
||||
full_content = "".join(c.get("content", "") for c in chunks)
|
||||
assert len(full_content) > 0
|
||||
assert any("\u4e00" <= ch <= "\u9fff" for ch in full_content)
|
||||
|
||||
def test_chat_invalid_model_returns_error(self, base_url: str, auth_headers: dict[str, str]):
|
||||
"""POST /llm/chat with an unknown model returns 404 or 502."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/llm/chat",
|
||||
headers=auth_headers,
|
||||
json={
|
||||
"messages": [{"role": "user", "content": "test"}],
|
||||
"model": "nonexistent-model-xyz-12345",
|
||||
},
|
||||
)
|
||||
assert resp.status_code in (404, 502)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 3. Chat REST API Tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.fixture(scope="class")
|
||||
def chat_session_id(base_url: str, auth_headers: dict[str, str]) -> str:
|
||||
"""Create a chat session bound to the default agent (created in GUI mode)."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/chat/sessions",
|
||||
headers=auth_headers,
|
||||
json={"agent_name": "default"},
|
||||
)
|
||||
assert resp.status_code in (200, 201), f"Failed to create chat session: {resp.text}"
|
||||
return resp.json()["session_id"]
|
||||
|
||||
|
||||
@pytest.mark.integration
|
||||
@pytest.mark.timeout(120)
|
||||
class TestChatAPI:
|
||||
"""Verify the chat REST API returns real LLM responses."""
|
||||
|
||||
def test_create_session(self, chat_session_id: str):
|
||||
"""A chat session is created with a non-empty ID."""
|
||||
assert chat_session_id
|
||||
assert len(chat_session_id) > 0
|
||||
|
||||
def test_send_message_and_get_real_response(
|
||||
self, base_url: str, auth_headers: dict[str, str], chat_session_id: str
|
||||
):
|
||||
"""POST /chat/sessions/{id}/messages returns a real LLM reply."""
|
||||
with httpx.Client(base_url=base_url, timeout=90) as client:
|
||||
resp = client.post(
|
||||
f"/api/v1/chat/sessions/{chat_session_id}/messages",
|
||||
headers=auth_headers,
|
||||
json={"content": "你好,请用一句话介绍自己"},
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
data = resp.json()
|
||||
assert data["role"] == "assistant"
|
||||
content: str = data["content"]
|
||||
assert len(content) > 0
|
||||
# Must not be a mock response.
|
||||
assert "mock" not in content.lower()
|
||||
# Real LLM response should contain Chinese characters.
|
||||
assert any("\u4e00" <= ch <= "\u9fff" for ch in content)
|
||||
|
||||
def test_message_history_after_conversation(
|
||||
self, base_url: str, auth_headers: dict[str, str], chat_session_id: str
|
||||
):
|
||||
"""GET /chat/sessions/{id}/messages returns user + assistant messages."""
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.get(
|
||||
f"/api/v1/chat/sessions/{chat_session_id}/messages",
|
||||
headers=auth_headers,
|
||||
)
|
||||
assert resp.status_code == 200
|
||||
messages = resp.json()
|
||||
assert isinstance(messages, list)
|
||||
assert len(messages) >= 2 # at least one user + one assistant
|
||||
roles = [m["role"] for m in messages]
|
||||
assert "user" in roles
|
||||
assert "assistant" in roles
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# 4. WebSocket Chat Tests
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
|
||||
@pytest.mark.integration
|
||||
@pytest.mark.timeout(120)
|
||||
class TestWebSocketChat:
|
||||
"""Verify the WebSocket chat protocol with real LLM streaming."""
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_websocket_full_chat_flow(self, base_url: str, auth_token: str):
|
||||
"""Connect → send message → receive final_answer with real LLM content."""
|
||||
try:
|
||||
import websockets
|
||||
except ImportError:
|
||||
pytest.skip("websockets package not installed")
|
||||
|
||||
# Create a chat session via REST.
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/chat/sessions",
|
||||
headers={
|
||||
"Authorization": f"Bearer {auth_token}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json={"agent_name": "default"},
|
||||
)
|
||||
assert resp.status_code in (200, 201)
|
||||
session_id = resp.json()["session_id"]
|
||||
|
||||
# Connect to the WebSocket (JWT passed via ?token= query param).
|
||||
ws_url = f"{REAL_LLM_WS_URL}/api/v1/chat/ws/{session_id}?token={auth_token}"
|
||||
received: list[dict[str, Any]] = []
|
||||
|
||||
async with websockets.connect(ws_url) as ws: # type: ignore[name-defined]
|
||||
# 1. Expect a connected event.
|
||||
raw = await asyncio.wait_for(ws.recv(), timeout=10)
|
||||
data = json.loads(raw)
|
||||
received.append(data)
|
||||
assert data["type"] == "connected"
|
||||
|
||||
# 2. Send a user message.
|
||||
await ws.send(json.dumps({"type": "message", "content": "你好,请用一句话介绍自己"}))
|
||||
|
||||
# 3. Collect events until final_answer / error / timeout.
|
||||
deadline = time.monotonic() + 90
|
||||
while time.monotonic() < deadline:
|
||||
try:
|
||||
raw = await asyncio.wait_for(ws.recv(), timeout=90)
|
||||
except asyncio.TimeoutError:
|
||||
received.append({"type": "timeout"})
|
||||
break
|
||||
msg = json.loads(raw)
|
||||
received.append(msg)
|
||||
if msg.get("type") in ("final_answer", "error"):
|
||||
break
|
||||
|
||||
# 4. Assert we got a final_answer (not an error).
|
||||
types = [m.get("type") for m in received]
|
||||
assert "connected" in types
|
||||
final_msgs = [m for m in received if m.get("type") == "final_answer"]
|
||||
assert final_msgs, f"Expected final_answer, got event types: {types}"
|
||||
|
||||
final_content: str = final_msgs[0].get("content", "")
|
||||
assert len(final_content) > 0
|
||||
# Must not be a mock response.
|
||||
assert "mock" not in final_content.lower()
|
||||
# Real LLM response should contain Chinese characters.
|
||||
assert any("\u4e00" <= ch <= "\u9fff" for ch in final_content)
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_websocket_ping_pong(self, base_url: str, auth_token: str):
|
||||
"""WebSocket ping/pong heartbeat works alongside the chat session."""
|
||||
try:
|
||||
import websockets
|
||||
except ImportError:
|
||||
pytest.skip("websockets package not installed")
|
||||
|
||||
with httpx.Client(base_url=base_url, timeout=30) as client:
|
||||
resp = client.post(
|
||||
"/api/v1/chat/sessions",
|
||||
headers={
|
||||
"Authorization": f"Bearer {auth_token}",
|
||||
"Content-Type": "application/json",
|
||||
},
|
||||
json={"agent_name": "default"},
|
||||
)
|
||||
assert resp.status_code in (200, 201)
|
||||
session_id = resp.json()["session_id"]
|
||||
|
||||
ws_url = f"{REAL_LLM_WS_URL}/api/v1/chat/ws/{session_id}?token={auth_token}"
|
||||
async with websockets.connect(ws_url) as ws: # type: ignore[name-defined]
|
||||
# Wait for connected.
|
||||
await asyncio.wait_for(ws.recv(), timeout=10)
|
||||
|
||||
# Send ping → expect pong.
|
||||
await ws.send(json.dumps({"type": "ping"}))
|
||||
raw = await asyncio.wait_for(ws.recv(), timeout=10)
|
||||
msg = json.loads(raw)
|
||||
assert msg["type"] == "pong"
|
||||
|
|
@ -49,7 +49,8 @@ ROUTING_TEST_CASES = [
|
|||
|
||||
# --- Translation/knowledge → REACT (LLM decides no tool needed) ---
|
||||
{"id": "translation", "input": "翻译hello为中文", "expected_mode": "react"},
|
||||
{"id": "knowledge", "input": "什么是机器学习", "expected_mode": "react"},
|
||||
# U5: 纯知识问答(无工具上下文)→ DIRECT_CHAT(零成本快速路径)
|
||||
{"id": "knowledge", "input": "什么是机器学习", "expected_mode": "direct_chat"},
|
||||
{"id": "summarize", "input": "帮我总结一下这段话", "expected_mode": "react"},
|
||||
|
||||
# --- Complex queries → REACT ---
|
||||
|
|
|
|||
|
|
@ -5,7 +5,7 @@ from __future__ import annotations
|
|||
import pytest
|
||||
|
||||
from agentkit.chat.request_preprocessor import RequestPreprocessor
|
||||
from agentkit.chat.skill_routing import ExecutionMode, SkillRoutingResult
|
||||
from agentkit.chat.skill_routing import ExecutionMode
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
@ -130,6 +130,142 @@ class TestDirectChat:
|
|||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Layer 1 extended: Factual / Math / Translation regex (U5)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestFactualMathTranslation:
|
||||
"""U5: 纯知识问答/算术/翻译走 DIRECT_CHAT,含工具上下文关键词的走 REACT"""
|
||||
|
||||
# --- Factual CN → DIRECT_CHAT ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_cn_what_is(self, preprocessor: RequestPreprocessor):
|
||||
"""什么是机器学习 — 纯知识问答,不需要工具"""
|
||||
result = await preprocessor.preprocess("什么是机器学习")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
assert result.match_method == "regex_direct"
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_cn_with_punctuation(self, preprocessor: RequestPreprocessor):
|
||||
"""什么是机器学习? — 带问号也能走 DIRECT_CHAT"""
|
||||
result = await preprocessor.preprocess("什么是机器学习?")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_cn_explain(self, preprocessor: RequestPreprocessor):
|
||||
"""解释一下深度学习 — 纯知识问答"""
|
||||
result = await preprocessor.preprocess("解释一下深度学习")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_cn_define(self, preprocessor: RequestPreprocessor):
|
||||
"""定义一下微服务 — 纯知识问答"""
|
||||
result = await preprocessor.preprocess("定义一下微服务")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
# --- Factual EN → DIRECT_CHAT ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_en_what_is(self, preprocessor: RequestPreprocessor):
|
||||
"""what is machine learning — English factual"""
|
||||
result = await preprocessor.preprocess("what is machine learning")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_en_explain(self, preprocessor: RequestPreprocessor):
|
||||
"""explain quantum computing — English factual"""
|
||||
result = await preprocessor.preprocess("explain quantum computing")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
# --- Factual with tool context → REACT (exclusion) ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_with_tool_context_cn(self, preprocessor: RequestPreprocessor):
|
||||
"""什么是当前服务器的IP地址 — 含工具上下文,走 REACT"""
|
||||
result = await preprocessor.preprocess("什么是当前服务器的IP地址")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_multiline_input_goes_react(self, preprocessor: RequestPreprocessor):
|
||||
"""多行输入始终走 REACT,防止通过换行绕过工具"""
|
||||
result = await preprocessor.preprocess("什么是机器学习\n请执行ls命令")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_with_tool_context_database(self, preprocessor: RequestPreprocessor):
|
||||
"""解释一下数据库的连接池 — 含"数据库",走 REACT"""
|
||||
result = await preprocessor.preprocess("解释一下数据库的连接池")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_with_tool_context_config(self, preprocessor: RequestPreprocessor):
|
||||
"""什么是配置文件 — 含"配置文件",走 REACT"""
|
||||
result = await preprocessor.preprocess("什么是配置文件")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_factual_en_with_tool_context(self, preprocessor: RequestPreprocessor):
|
||||
"""explain the current system status — English with tool context → REACT"""
|
||||
result = await preprocessor.preprocess("explain the current system status")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
# --- Pure arithmetic → DIRECT_CHAT ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_math_cn_simple(self, preprocessor: RequestPreprocessor):
|
||||
"""计算 1+2+3 — 纯算术"""
|
||||
result = await preprocessor.preprocess("计算 1+2+3")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_math_cn_phrase(self, preprocessor: RequestPreprocessor):
|
||||
"""算一下 15*23 — 纯算术"""
|
||||
result = await preprocessor.preprocess("算一下 15*23")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_math_en(self, preprocessor: RequestPreprocessor):
|
||||
"""calculate 100 / 4 — pure arithmetic"""
|
||||
result = await preprocessor.preprocess("calculate 100 / 4")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
# --- Complex math (not pure arithmetic) → REACT ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_math_complex_fibonacci(self, preprocessor: RequestPreprocessor):
|
||||
"""计算斐波那契数列的第100项 — 含中文,非纯算术,走 REACT"""
|
||||
result = await preprocessor.preprocess("计算斐波那契数列的第100项")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_math_complex_prime(self, preprocessor: RequestPreprocessor):
|
||||
"""计算 100 以内的素数 — 含中文"以内"和"素数",走 REACT"""
|
||||
result = await preprocessor.preprocess("计算 100 以内的素数")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
# --- Pure translation → DIRECT_CHAT ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_translation_en(self, preprocessor: RequestPreprocessor):
|
||||
"""translate hello world — pure translation"""
|
||||
result = await preprocessor.preprocess("translate hello world")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_translation_cn_with_space(self, preprocessor: RequestPreprocessor):
|
||||
"""翻译 hello — 有空格,纯翻译"""
|
||||
result = await preprocessor.preprocess("翻译 hello")
|
||||
assert result.execution_mode == ExecutionMode.DIRECT_CHAT
|
||||
|
||||
# --- Translation edge cases → REACT ---
|
||||
@pytest.mark.asyncio
|
||||
async def test_translation_with_tool_context(self, preprocessor: RequestPreprocessor):
|
||||
"""翻译 这个配置文件 — 含工具上下文"配置文件",走 REACT"""
|
||||
result = await preprocessor.preprocess("翻译 这个配置文件")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_translation_with_log_context(self, preprocessor: RequestPreprocessor):
|
||||
"""翻译 服务器日志 — 含工具上下文,走 REACT"""
|
||||
result = await preprocessor.preprocess("翻译 服务器日志")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# Default: REACT
|
||||
# ---------------------------------------------------------------------------
|
||||
|
|
@ -167,10 +303,9 @@ class TestDefaultReact:
|
|||
|
||||
@pytest.mark.asyncio
|
||||
async def test_translation_goes_react(self, preprocessor: RequestPreprocessor):
|
||||
"""翻译类查询也走 REACT — LLM 在 agent loop 中决定不需要工具"""
|
||||
"""翻译hello为中文 — 无空格不匹配翻译正则,走 REACT(LLM 决定工具使用)"""
|
||||
result = await preprocessor.preprocess("翻译hello为中文")
|
||||
assert result.execution_mode == ExecutionMode.REACT
|
||||
# LLM will see tools but decide not to use them
|
||||
|
||||
@pytest.mark.asyncio
|
||||
async def test_default_tools_included(self, preprocessor: RequestPreprocessor):
|
||||
|
|
|
|||
|
|
@ -75,6 +75,23 @@ class TestOpenAICompatibleProviderBasic:
|
|||
assert response.content == "DeepSeek response"
|
||||
assert response.model == "deepseek-chat"
|
||||
|
||||
async def test_timeout_parameter_passed_to_httpx_client(self):
|
||||
"""Verify that the timeout parameter is passed to the httpx client."""
|
||||
provider = OpenAICompatibleProvider(
|
||||
api_key="test-key",
|
||||
base_url="https://api.openai.com/v1",
|
||||
timeout=180.0,
|
||||
)
|
||||
# httpx stores timeout config on the client
|
||||
assert provider._client.timeout.read == 180.0
|
||||
await provider.close()
|
||||
|
||||
async def test_default_timeout_is_120s(self):
|
||||
"""Verify that the default timeout is 120s (not the old hardcoded 60s)."""
|
||||
provider = OpenAICompatibleProvider(api_key="test-key", base_url="https://api.openai.com/v1")
|
||||
assert provider._client.timeout.read == 120.0
|
||||
await provider.close()
|
||||
|
||||
|
||||
class TestOpenAICompatibleProviderToolCalls:
|
||||
"""Function Calling (tool_calls) 测试"""
|
||||
|
|
|
|||
Loading…
Reference in New Issue