Commit Graph

7 Commits

Author SHA1 Message Date
chiguyong 2e404cf1a0 test: 全面回测 + 真实 LLM E2E + 能力 benchmark + 问题修复
## 测试结果

### 后端 E2E(真实 LLM,真实服务器)— 13/13 通过
- tests/e2e/test_real_llm_e2e.py: 认证流程、LLM 网关、Chat API、WebSocket
- 使用百炼 coding plan(qwen3.7-plus)真实 LLM,无 mock
- 修复 SQLite 写锁竞争导致的间歇性 500(_login_with_retry 重试机制)

### 前端 E2E(Playwright + 真实 LLM)— 11/11 通过
- login.spec.ts (4): 登录流程、表单验证、token 存储
- chat.spec.ts (3): 真实 LLM 对话、消息渲染
- terminal.spec.ts (4): 终端面板、白名单管理
- 使用系统 Chrome(channel: 'chrome')避免浏览器下载

### Benchmark 能力评估(真实 LLM)
- full 模式: 60% 准确率(5 用例 3 通过 2 超时)
- fast 模式: 100% 准确率
- 失败用例: llm-001 (intent_understanding) / llm-004 (code_generation) 均为超时

### 单元测试
- 174 个新测试通过
- 28 个预存失败(非本次架构变更引入)

## 代码修复

### chat.ts: 消除 any 类型 TODO(line 406)
- handleWsMessage 参数从 Record<string, any> 改为 WsServerMessage 联合类型
- 使用判别联合窄化,每个 case 分支直接访问类型化字段
- 移除通用 payload 变量,移除未使用的类型导入
- vue-tsc --noEmit 零错误

### 基础设施修复
- playwright.config.ts: 修复 PROJECT_ROOT 路径(4 级而非 2 级)
- playwright.config.ts: 用 uvicorn.run() 替代 agentkit serve(避免非 tty 交互提示)
- helpers.ts: API_BASE 改为绝对 URL(Node.js fetch 不支持相对 URL)
- helpers.ts: clearAuth 修复 page.evaluate 上下文问题(Node 常量传入浏览器)
- helpers.ts: loginViaApi 添加 429 限流重试 + token 缓存
- login.spec.ts / terminal.spec.ts: 修复 Ant Design Vue autoInsertSpace 导致的选择器不匹配
- chat.spec.ts: .first() 改 .last() 避免拾取历史消息
- setup-test-user.py: .local 邮箱改为 .com(EmailStr 拒绝 .local TLD)
- .gitignore: Playwright 产物路径限定到 frontend 目录

### 依赖
- pyproject.toml: 补充 pyjwt, bcrypt, aiosqlite 依赖
- package.json: 添加 @playwright/test 依赖

## 未完成计划清单(核对结果)

### 计划 001(聊天主区 VI 重梳)— active
- U7: SkillsTab/SystemTab/KnowledgeTab 三子组件未实现
- U8: Preview 样例场景精修未完成
- U9: BoardMeetingModal VI 适配收尾未完成
- U10: 质量门与后端回归测试未完成

### 计划 002(企业级 C/S 架构)— 方案评审中
- 8 个待决策问题未明确(卖给谁/部署位置/终端形态等)
- P2/P3/P4 模块延后

### 计划 003(企业级 C/S 演进)— completed
- 7 项 Deferred(Web 管理台/技能市场/SSO/代码索引/多租户等)

### 代码 stub
- DockerComputerUseSession: start/stop/screenshot/execute_action 4 个方法为 stub
  (需真实 Docker + VNC + Anthropic Computer Use API,属未来功能)
2026-06-20 18:22:10 +08:00
chiguyong 64d62a2b60 feat: autonomous task execution - connect PlanExecEngine + TeamOrchestrator
U1: TeamOrchestrator._execute_phase real execution (Expert.agent.execute)
U2: LLM-based merge strategies (BEST/VOTE/FUSION) with fallback
U3: ReActStepExecutor replacing _LLMStepAgent for tool-enabled steps
U4: SharedWorkspace integration for cross-phase/cross-execution state
U5: GoalPlanner prompt tuning with few-shot and verb pattern matching
U6: Replan-before-fallback in TeamOrchestrator
U7: End-to-end validation tests for multi-step research tasks
U8: WebSocket progress events (step_event_callback + new event types)

Code review fixes: P0 response.strip fix, P1 competitor status check,
milestone real impl, VOTE self-bias fix, confirmation_handler wiring,
ExpertTeam public API, DRY _build_result_summaries, replan tests

Also: geo_server.py refactor (ServerConfig.from_yaml), delete llm_config.yaml
2026-06-15 12:41:32 +08:00
chiguyong b2709da08b feat(cli): AgentKit CLI with serve/version/health/task/skill/init/usage
U1: CLI framework (Typer) + serve/version/health commands + __main__.py + pyproject scripts
U2: task command group (submit/status/list/cancel) with remote mode
U3: skill command group (list/load/info) with local and remote modes
U4: init command (generates agentkit.yaml/.env.example/docker-compose/skills) + usage command

31 tests passing, TDD workflow.
2026-06-06 12:45:51 +08:00
chiguyong 2844eeb548 feat(streaming): Phase C - LLM streaming + ReAct events + SSE endpoint
U8: StreamChunk protocol + OpenAI chat_stream + Gateway streaming with usage tracking
U9: ReActEvent dataclass + execute_stream() yielding thinking/tool_call/tool_result/final_answer
U10: POST /tasks/stream SSE endpoint + Client SDK stream_task()

15 new tests passing, no regression.
2026-06-06 11:54:17 +08:00
chiguyong f87b790c0f feat(agentkit): v2 Phase 1 - ReAct/LLM Gateway/Skill/Server + review fixes
535 unit + 52 integration tests passing. README added.
2026-06-05 23:32:16 +08:00
chiguyong cc3dfd44e3 fix: switch to setuptools for Python 3.14 compatibility 2026-06-04 22:27:06 +08:00
chiguyong 9a6d6fee4e feat: initial fischer-agentkit package with unified agent architecture
- BaseAgent with handle_task() pattern (execute template moved up)
- Protocol: TaskMessage, TaskResult, HandoffMessage, EvolutionEvent
- Tool system: FunctionTool, AgentTool, ToolRegistry with versioning
- Memory system: WorkingMemory (Redis), EpisodicMemory (pgvector), SemanticMemory (RAG adapter), MemoryRetriever (hybrid)
- Evolution engine: Reflector, PromptOptimizer (DSPy-style), StrategyTuner, ABTester, EvolutionStore
- Orchestrator: PipelineEngine (parallel DAG), PipelineLoader (YAML), HandoffManager, DynamicPipeline
- MCP: Server (FastAPI), Client (httpx), MCPTool
- Prompts: PromptTemplate, PromptSection
- Exceptions: full hierarchy including Tool, Schema, Handoff, Evolution errors
- Tests: unit tests for core, tools, protocol, evolution, pipeline
2026-06-04 22:24:06 +08:00