Commit Graph

12 Commits

Author SHA1 Message Date
chiguyong abf758fa9c feat(U2): G6 PhaseState + PhasePolicy + ServerConfig.plan_exec
- PhaseState enum (PLANNING/BUILDING/VERIFICATION/DELIVERY) with next_of/from_string
- PhasePolicy dataclass with whitelist + bash_command_filter + auto_advance_after_steps
- default_policy() factory — KTD5 whitelist matching R24 (Planning: search/read_file;
  Building: write_file; Delivery: wildcard)
- bash_command_filter blocks rm/mv/cp/>/>> in PLANNING/VERIFICATION phases
- policy_from_config() parses plan_exec YAML section (R26) with override merge
- ServerConfig.plan_exec field + from_dict parsing (extends Wave 1/2 pattern)
- agentkit.yaml gains commented plan_exec section (opt-in)
- 37 unit tests covering PhaseState, default_policy, is_tool_allowed,
  bash filter, config parsing, and ServerConfig integration
2026-06-29 23:58:56 +08:00
chiguyong 80b02f58a6 feat(U3): G7 三级 fallback 链路接通 chat REST
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
- 新增 agentkit/server/_fallback_chain.py: execute_with_fallback_chain
  Main (ReActEngine) → Recovery (ReflexionEngine) → Emergency (EmergencyRules)
- chat.py send_message 用 chain 包装 react_engine.execute (KTD5)
- ReflexionEngine 内部 ReAct 调用不走 chain (避免递归)
- TaskCancelledError 直接传播, 不进入 Emergency (KTD3)
- soft failure (empty_fallback/verify_failed) 也触发 Recovery
- Recovery 失败/异常 → Emergency 用 EmergencyRules.classify 分类
- ServerConfig.from_dict 读取 fallback_chain.{recovery,emergency}
- 17 个测试覆盖 Main/Recovery/Emergency 三层 + 配置
2026-06-29 23:07:38 +08:00
chiguyong b1841ce21b feat(U4): G9 PlanPhase rollback + RollbackExecutor
- PlanPhase 新增 validation_command / rollback_command 可选字段 (KTD6 opt-in)
- to_dict 仅在字段非 None 时输出新键,保持既有 dict shape (KTD6 契约)
- 新增 RollbackExecutor (orchestrator/rollback.py) 复用 VerificationLoop
  subprocess 模式,绕过 ShellTool 避免 confirm_callback 拦截 (KTD7)
- TeamOrchestrator._run_phase_rollback 实现 R21 顺序:
  validation → rollback → checkpoint.save (仅在前者通过时调用)
- ServerConfig.from_dict 读取 rollback.default_timeout
- 20 个测试覆盖 characterization / happy / timeout / git integration / 配置
2026-06-29 22:55:08 +08:00
chiguyong 8d5ccca604 feat(U1): G4 ContextCompressor 辅助 LLM 路由
_summarize 优先尝试 auxiliary_model(成本优化的廉价模型,如 qwen-turbo),
失败或返回空内容(Finding 4 反模式)时回退到主模型,主模型失败仍走
_simple_summary 兜底。auxiliary_model=None 时保持既有单模型调用行为。

- ContextCompressor 新增 auxiliary_model 参数
- LLMConfig 新增 auxiliary_model 字段,ServerConfig._build_llm_config 透传
- agentkit.yaml 文档化 llm.auxiliary_model: fast(注释,保留默认行为)
- 测试: 9 场景覆盖成功/空内容/异常/双向失败/aux=main 跳过/审计字段/配置接线
2026-06-29 22:37:14 +08:00
chiguyong dddcbd24e3 feat: 私董会讨论模式 + 回测集成 + WS持久化修复
私董会讨论模式 (Board Meeting Mode):
- BoardRouter: @board 前缀路由, 专家名验证, 模板回退
- BoardTeam: 讨论容器, 状态机 (FORMING->DISCUSSING->CONCLUDING->COMPLETED)
- BoardOrchestrator: 多轮自主循环讨论引擎, 主持人小结, 停止命令检测
- 9个预设名人专家 YAML (马斯克/贝佐斯/张小龙/芒格等)
- 前端 BoardStatusView 群聊式 UI + WebSocket 事件处理
- 后端 chat.py 集成 @board 路由到主聊天流程

回测集成:
- benchmark.py: 新增 board_meeting 维度 (18 tasks, 6 categories)
- benchmark_dataset.py: 新增 BOARD_BENCHMARKS (11 E2E cases)
- test_board_backtest.py: 66 个回测测试 (9 test classes)

Bug 修复:
- resolve_expert_configs: deep-copy 防止 is_lead 修改污染共享模板
- 所有专家名无效时回退到默认模板
- board_router: 非匹配路径 topic 未 strip
- benchmark_dataset: board-name-invalid-001 输入修正

WebSocket 持久化修复:
- chat.py: 三层防御机制确保任务结果不丢失
- chat store: 断线恢复逻辑

部署配置:
- Gitea Actions CI/CD workflow
- docker-compose.deploy.yaml 部署编排
- scripts/deploy.sh 自动化部署脚本

测试结果: 120 单元测试通过, 71 benchmark 测试 100% 通过, ruff 全部通过
2026-06-17 23:52:53 +08:00
chiguyong a27eed3714 fix(config): unify config loading chain and protect ${VAR} references
- Settings API: reverse-resolve env vars to preserve ${VAR} refs in yaml,
  write new API keys to .env instead of agentkit.yaml, extract env_key
  from existing ${VAR} reference when updating providers
- Onboarding: merge-update instead of overwrite when config exists,
  use config_arg to determine output path, .env merge instead of overwrite
- Unified templates: bailian-coding provider name, full model_aliases,
  docker-compose with postgres, expanded .env.example
- Optional ruamel.yaml for comment/format preservation in Settings API
- clients.yaml: add _deep_resolve for ${VAR} env var references
- All CLI commands use load_config_with_dotenv() consistently
- Tests: mock find_config_path and CWD auto-discovery to avoid env leaks
2026-06-16 00:26:54 +08:00
chiguyong 11e2009cb8 feat(router): improve colloquial/mixed-lang routing, fix low-complexity IntentRouter bypass
Key improvements:
- Low-complexity queries (<0.3) now try IntentRouter keyword match
  before falling back to DIRECT_CHAT, fixing 0% F1 on keyword_match
- SemanticRouter similarity_low lowered from 0.6 to 0.4
- Short text (<20 chars) uses effective_low = max(0.25, low - 0.15)
- Short text with no semantic match forces LLM classify fallback
- Added colloquial keywords to 7 skill YAMLs
- Fixed code_reviewer.yaml output_schema placement
- Fixed SemanticRouter build in e2e tests
- Fixed base_url detection for bailian-coding API keys

Results: keyword_match F1 0->60.87%, colloquial F1 0->100%, mixed_lang F1 0->100%
2026-06-15 23:54:57 +08:00
chiguyong fa2a6dece2 feat(router): enable SemanticRouter + upgrade benchmark to L3/L5
- Enable SemanticRouter in agentkit.yaml (router.semantic.enabled: true)
- Integrate SemanticRouter into e2e backtest (_build_real_components)
- Add 8 new semantic test cases: 5 colloquial + 3 mixed-lang expressions
- Add L3 output quality evaluation framework (LLM-as-Judge, 1-5 score)
- Add L5 adaptive capability metrics (consistency rate from overfitting data)
- Add OutputQualityObservation model and evaluate_output_quality() method
- Report now includes L3 and L5 sections

Results: 52 tests pass, description_match F1=66.67%, L5 adaptive rate=100%
2026-06-15 23:02:47 +08:00
chiguyong 99fe4c99f7 fix: comprehensive code review fixes + WS test stability 2026-06-15 08:17:34 +08:00
chiguyong 0ccef7be5c feat: P0 production hardening — LLM cache, semantic routing, state persistence
U1: LLM Cache Core (exact + semantic match, InMemory + Redis backends)
U2: Cache integration into LLMGateway with CacheConfig
U3: Semantic Router as Layer 1.5 in CostAwareRouter
U4: UsageStore persistence (Redis Hash + InMemory fallback)
U5: CascadeStateStore persistence (Redis INCR + InMemory TTL)
U6: EvolutionStore interface unification (Protocol + PostgreSQL backend)
U7: Configuration integration + E2E tests

Code review fixes:
- P0: date iteration bug (day>=28), semantic router index never built,
      Redis connection leak (per-call → persistent pool)
- P1: cache degradation recovery, semantic_search degradation,
      double miss counting, asyncio.Lock for PG init, LIMIT on queries,
      __import__ anti-pattern → _utcnow()
- P2: InMemory TTL cleanup, embedding preservation on put(),
      data TTL = max(exact_ttl, semantic_ttl)
2026-06-14 15:16:00 +08:00
chiguyong 5ef08a3b30 fix(review): comprehensive P0-P2 code review fixes 2026-06-12 22:18:25 +08:00
chiguyong a36bc3d1c1 feat: optimize chat response speed for sub-1s first token latency
- Add HeuristicClassifier to replace LLM quick_classify with zero-cost
  local heuristic (keyword/length/code-pattern scoring), gated by
  router.classifier config (default: heuristic)
- Add parallel tool execution in ReActEngine via asyncio.gather for
  multiple independent tool_calls, gated by parallel_tools param
- Add AsyncWriteQueue for non-blocking session persistence with WAL
  buffer, gated by async_writes param on SessionManager
- Add httpx.Limits connection pool config to all LLM providers
- Add router config section to ServerConfig and agentkit.yaml
- All optimizations have config switches for safe rollback
2026-06-12 13:15:06 +08:00