chiguyong
fef7ecea39
feat(skills): SkillHarness 激活前置条件 + 风险守卫学习
...
基于 SkillHarness 论文(arXiv:2606.20636)与 Agent Skills 综述
(arXiv:2602.12430)引入激活前置条件(preconditions)与来源标记
(provenance),并新增从失败轨迹学习风险守卫建议的能力。
变更内容:
- U1: SkillConfig 新增 v7 preconditions/provenance 字段(base.py)
- U2: build_skill_system_prompt 注入 preconditions 软检查段落
- U3: SkillLoader 三路径记录 provenance + entry_points 危险能力告警
- U4: 10 个业务 Skill YAML 补充 preconditions(2-4 条中文短句)
- U5: RiskGuardLearner 从失败轨迹学习风险守卫建议(人工审查,不自动应用)
- U6: CLI 命令 agentkit skill learn-risk-guards
关键决策:
- KTD1: preconditions 通过 system_prompt 注入(软检查),不做硬 LLM 调用
- KTD2: RiskGuardLearner 不自动应用,需人工审查(论文显示 75% 自动学习不安全)
- KTD3: provenance 为轻量字符串,不加 hash/签名(无合规需求)
测试:39 个新增单元测试全部通过,ruff 检查通过。
2026-06-24 13:56:37 +08:00
chiguyong
cac9c73dd5
fix(routing): U1-U6 路由优化 + 修复方案 + 代码审查修复
...
实现 6 个修复单元(U1-U6)并应用 ce-code-review 发现的 5 项安全修复。
## U1: benchmark 超时阈值
- 按 difficulty 分级超时:easy=45s, medium=60s, hard=90s
- 替换原单一 60s 硬编码
## U2: OpenAICompatibleProvider httpx 超时
- 新增 timeout 参数(默认 120s),替换硬编码 60s
- ProviderConfig.timeout 透传到 Provider
- 新增 2 项单元测试
## U3: 激活 QualityGate skill_match 校验
- BaseAgent._build_skill_context() 构造 skill_context
- 在 base.py / tasks.py / runner.py 三处传入 QualityGate.validate()
## U4: 添加 disambiguation_keywords 字段
- IntentConfig 新增 disambiguation_keywords 字段
- 8 个 skill YAML 补充该字段
## U5: 优化 RequestPreprocessor 路由正则
- 拆分 _FACTUAL_RE 为 CN/EN 双正则(中文无空格)
- 新增 _MATH_RE / _TRANSLATION_RE 纯模式
- _TOOL_CONTEXT_RE 排除需要工具的实时查询
- 多行输入守卫 + 结尾标点支持
- 新增 21 项单元测试(共 40 项全通过)
## U6: 重新基准测试
- 真实 LLM benchmark:准确率 60% -> 93.3%
- 4/5 通过,p50=40.8s,一致性=100%
- 旧基线备份至 baseline_2026-06-17_old_arch.json
## ce-code-review 修复(5 项)
- 修复 \s 字符类匹配换行符的安全隐患
- 添加事实/数学正则的结尾标点支持
- 修复 geo_optimizer.yaml 关键词重复
- 修复 _login_with_retry 不可达 return
- 修复 real_llm_server fixture stderr_fh 资源泄漏
测试:tests/unit/chat/ 63 项全通过,ruff 检查通过。
2026-06-20 19:31:49 +08:00
chiguyong
ee6d16345c
feat(experts): U7 新增 5 个编程专家模板 + dev_team 团队模板 + ExpertTeamRouter 模板展开
2026-06-18 01:50:43 +08:00
chiguyong
dddcbd24e3
feat: 私董会讨论模式 + 回测集成 + WS持久化修复
...
私董会讨论模式 (Board Meeting Mode):
- BoardRouter: @board 前缀路由, 专家名验证, 模板回退
- BoardTeam: 讨论容器, 状态机 (FORMING->DISCUSSING->CONCLUDING->COMPLETED)
- BoardOrchestrator: 多轮自主循环讨论引擎, 主持人小结, 停止命令检测
- 9个预设名人专家 YAML (马斯克/贝佐斯/张小龙/芒格等)
- 前端 BoardStatusView 群聊式 UI + WebSocket 事件处理
- 后端 chat.py 集成 @board 路由到主聊天流程
回测集成:
- benchmark.py: 新增 board_meeting 维度 (18 tasks, 6 categories)
- benchmark_dataset.py: 新增 BOARD_BENCHMARKS (11 E2E cases)
- test_board_backtest.py: 66 个回测测试 (9 test classes)
Bug 修复:
- resolve_expert_configs: deep-copy 防止 is_lead 修改污染共享模板
- 所有专家名无效时回退到默认模板
- board_router: 非匹配路径 topic 未 strip
- benchmark_dataset: board-name-invalid-001 输入修正
WebSocket 持久化修复:
- chat.py: 三层防御机制确保任务结果不丢失
- chat store: 断线恢复逻辑
部署配置:
- Gitea Actions CI/CD workflow
- docker-compose.deploy.yaml 部署编排
- scripts/deploy.sh 自动化部署脚本
测试结果: 120 单元测试通过, 71 benchmark 测试 100% 通过, ruff 全部通过
2026-06-17 23:52:53 +08:00
chiguyong
a1318df420
feat: add LLM and GUI benchmark modes with real agent testing
2026-06-17 12:55:19 +08:00
chiguyong
1fbfd9d132
refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline)
2026-06-17 12:01:34 +08:00
chiguyong
89a9534678
feat: add benchmark_runner skill for capability testing and report generation
2026-06-17 11:31:15 +08:00
chiguyong
a27eed3714
fix(config): unify config loading chain and protect ${VAR} references
...
- Settings API: reverse-resolve env vars to preserve ${VAR} refs in yaml,
write new API keys to .env instead of agentkit.yaml, extract env_key
from existing ${VAR} reference when updating providers
- Onboarding: merge-update instead of overwrite when config exists,
use config_arg to determine output path, .env merge instead of overwrite
- Unified templates: bailian-coding provider name, full model_aliases,
docker-compose with postgres, expanded .env.example
- Optional ruamel.yaml for comment/format preservation in Settings API
- clients.yaml: add _deep_resolve for ${VAR} env var references
- All CLI commands use load_config_with_dotenv() consistently
- Tests: mock find_config_path and CWD auto-discovery to avoid env leaks
2026-06-16 00:26:54 +08:00
chiguyong
11e2009cb8
feat(router): improve colloquial/mixed-lang routing, fix low-complexity IntentRouter bypass
...
Key improvements:
- Low-complexity queries (<0.3) now try IntentRouter keyword match
before falling back to DIRECT_CHAT, fixing 0% F1 on keyword_match
- SemanticRouter similarity_low lowered from 0.6 to 0.4
- Short text (<20 chars) uses effective_low = max(0.25, low - 0.15)
- Short text with no semantic match forces LLM classify fallback
- Added colloquial keywords to 7 skill YAMLs
- Fixed code_reviewer.yaml output_schema placement
- Fixed SemanticRouter build in e2e tests
- Fixed base_url detection for bailian-coding API keys
Results: keyword_match F1 0->60.87%, colloquial F1 0->100%, mixed_lang F1 0->100%
2026-06-15 23:54:57 +08:00
chiguyong
64d62a2b60
feat: autonomous task execution - connect PlanExecEngine + TeamOrchestrator
...
U1: TeamOrchestrator._execute_phase real execution (Expert.agent.execute)
U2: LLM-based merge strategies (BEST/VOTE/FUSION) with fallback
U3: ReActStepExecutor replacing _LLMStepAgent for tool-enabled steps
U4: SharedWorkspace integration for cross-phase/cross-execution state
U5: GoalPlanner prompt tuning with few-shot and verb pattern matching
U6: Replan-before-fallback in TeamOrchestrator
U7: End-to-end validation tests for multi-step research tasks
U8: WebSocket progress events (step_event_callback + new event types)
Code review fixes: P0 response.strip fix, P1 competitor status check,
milestone real impl, VOTE self-bias fix, confirmation_handler wiring,
ExpertTeam public API, DRY _build_result_summaries, replan tests
Also: geo_server.py refactor (ServerConfig.from_yaml), delete llm_config.yaml
2026-06-15 12:41:32 +08:00
chiguyong
99fe4c99f7
fix: comprehensive code review fixes + WS test stability
2026-06-15 08:17:34 +08:00
chiguyong
5ef08a3b30
fix(review): comprehensive P0-P2 code review fixes
2026-06-12 22:18:25 +08:00
chiguyong
8c365486e2
fix(pipeline): address code review findings for adversarial loop
...
Critical:
- C1: Add verifier_timeout_seconds for independent Verifier timeout
- C2: Verifier parse failure raises RuntimeError instead of dead-loop
Major:
- M1: Inject previous_output into Worker retry context
- M2: Add Pydantic ge/le constraint on ReviewFeedback.score
- M3: Use Literal type for feedback_mode enum validation
- M4: Use Literal types for ReviewIssue severity and category
- M5: Merge error messages when escalation agent also fails
Tests: 8 new test cases added (19 total), all passing
2026-06-12 10:02:37 +08:00
chiguyong
6731d96c65
feat(configs): add code_reviewer skill and coding_harness pipeline
...
- code_reviewer.yaml: Verifier Agent skill config for adversarial review
with structured output schema for ReviewFeedback format
- coding_harness.yaml: Example pipeline with adversarial loop
develop → test → review (Worker↔Verifier) → archive
2026-06-12 09:38:37 +08:00
chiguyong
2110c84fb6
fix: switch default model to qwen3-coder-plus for better function calling
...
DeepSeek-chat has limited/partial function calling support. Qwen3-coder-plus
(DashScope) has robust OpenAI-compatible function calling.
Also added tool usage instructions to system prompt and enhanced logging
to trace tool propagation through the pipeline.
2026-06-12 09:27:52 +08:00
chiguyong
cc4c6fe346
fix: direct-mode agent falls through to default when task needs tools
...
When IntentRouter matches a direct-mode agent (no tools), but the task
content suggests tool needs (shell, search, file ops, etc.), the routing
now falls through to the default agent which has full tool access.
This fixes the issue where "帮我执行个命令" would be routed to
direct_agent and fail because direct mode doesn't support tool calling.
Also restored "你好" in direct_agent keywords since it's correctly
handled now — greetings don't need tools, direct mode is fine.
2026-06-11 15:26:19 +08:00
chiguyong
52b7d6007d
fix: remove '你好' from direct_agent keywords so greetings route to default agent with tools
2026-06-11 14:49:59 +08:00
chiguyong
93bc7c4e3e
fix: change all agent YAML model from hardcoded provider to 'default'
...
Hardcoded model names like 'openai/gpt-4o-mini' or 'anthropic/claude-sonnet'
cause 'No provider available' errors when the specific provider isn't configured.
Using 'default' lets the system pick the available provider automatically.
2026-06-11 14:19:26 +08:00
chiguyong
5b42487d8a
feat(core): add ReWOO, Plan-and-Execute, Reflexion execution engines
...
Phase A of Multi-Agent Marketplace architecture:
- ReWOOEngine: plan-all-then-execute pattern for parallel data fetch
- PlanExecEngine: adapter wrapping GoalPlanner+PlanExecutor+PipelineReplanner
- ReflexionEngine: ReAct + Evaluate + Reflect + Retry for high-precision tasks
- SkillConfig: extend VALID_EXECUTION_MODES with rewoo/plan_exec/reflexion
- ConfigDrivenAgent: add _handle_rewoo/_handle_plan_exec/_handle_reflexion routes
- 5 professional agent YAML configs with layered model defaults
- 107 unit tests passing
2026-06-10 17:08:48 +08:00
chiguyong
7874e875af
merge: integrate feat/agentkit-phase8-chat-adaptive (chat/gui commands + GUI mode)
...
Restores agentkit chat, agentkit gui CLI commands, onboarding wizard,
and GUI mode (AGENTKIT_GUI_MODE) with static file serving.
Resolves merge conflicts in orchestrator.py, app.py, tools/__init__.py, shell.py.
2026-06-10 07:44:06 +08:00
chiguyong
b34f74f598
feat(phase6): implement end-to-end enterprise scenario validation (U15)
...
- Add goal-driven agent skill config and pipeline config
- Add 9 E2E integration tests covering all 7 capabilities:
- SC1: Goal-driven SEO analysis (GoalPlanner→PlanExecutor→PlanChecker→ExperienceStore)
- SC2: Knowledge Q&A with system operation (MultiSourceRAG)
- SC3: Workflow with approval (WorkflowStore + approval node)
- SC4: Self-evolution experience accumulation (ExperienceStore→PitfallDetector→PathOptimizer)
- SC5: Parallel execution efficiency verification
- SC6: Skill registry integration (capabilities, versions, health)
- Cross-capability: Plan+Experience+Pitfall, Review+Experience, RAG+Workflow
- All 2472 tests passing (9 integration + 2463 unit)
2026-06-10 01:38:28 +08:00
chiguyong
31bd3b126c
feat(phase8): chat adaptive enhancements, pipeline reflection, search tools upgrade
...
- Enhanced chat CLI with adaptive mode and session management
- Added pipeline reflection and schema extensions
- Upgraded BaiduSearch and WebSearch tools with advanced capabilities
- Expanded server routes for skills and chat
- Added session store enhancements
- New chat module and pipeline reflection support
2026-06-09 23:18:06 +08:00
chiguyong
bad66445ff
feat(compression): U6 GEO Pipeline compression integration tests and config
...
Add GEO Pipeline end-to-end compression integration tests with
MockHeadroomCompressor. Add compression configuration section to
llm_config.yaml with headroom and summary mode examples.
2026-06-07 18:20:41 +08:00
chiguyong
2e547e345a
feat(geo): U4 GEO skill tool binding with BaiduSearch and E2E tests
...
Add BaiduSearchTool (API mode + scraping fallback), bind tools to
GEO skill YAML configs (baidu_search, web_crawl, schema_extract,
schema_generate), extend geo_full_pipeline with generate_content
and deai steps, add 36 E2E integration tests.
2026-06-07 17:25:37 +08:00
chiguyong
1390bd8d6e
feat(skills): U5 GEO Pipeline orchestration with DAG execution
...
- GEOPipeline: YAML-driven DAG pipeline with parallel/sequential execution
- PipelineStep with input_mapping ($.input.xxx, $.steps.name.output.xxx)
- Topological sort for execution groups, SharedWorkspace integration
- geo_full_pipeline.yaml: detect→analyze→optimize→track workflow
- 10 tests passing
2026-06-06 22:34:24 +08:00
chiguyong
f858d279f3
feat(agentkit): Phase 3 upgrade - persistence, memory, evolution, observability
...
10 Implementation Units across 3 phases:
Phase A - Infrastructure:
- U1: RedisTaskStore with Redis/memory backend + factory function
- U2: TraceRecorder for execution trace recording
- U3: PersistentEvolutionStore with SQLite backend
Phase B - Core Capabilities:
- U4: MemoryRetriever integration into ReAct engine
- U5: Embedder abstraction + EpisodicMemory vector search
- U6: LLMReflector for LLM-in-the-loop reflection
- U7: SkillPipeline for multi-skill orchestration
Phase C - Enhancement:
- U8: SKILL.md format + progressive disclosure levels
- U9: ContextCompressor + prompt cache rendering
- U10: Structured logging + metrics endpoint + enhanced health check
Tests: 924 passed, 18 skipped, 0 failed
2026-06-06 17:17:45 +08:00
chiguyong
2844eeb548
feat(streaming): Phase C - LLM streaming + ReAct events + SSE endpoint
...
U8: StreamChunk protocol + OpenAI chat_stream + Gateway streaming with usage tracking
U9: ReActEvent dataclass + execute_stream() yielding thinking/tool_call/tool_result/final_answer
U10: POST /tasks/stream SSE endpoint + Client SDK stream_task()
15 new tests passing, no regression.
2026-06-06 11:54:17 +08:00
chiguyong
669ca604e5
feat(configs): add GEO AgentKit Server configuration
...
- llm_config.yaml: DeepSeek + OpenAI-compatible providers with env var substitution
- skills/ (8 YAML): citation_detector, content_generator, deai_agent, geo_optimizer,
monitor, schema_advisor, competitor_analyzer, trend_agent
- Added intent fields for content_generator, competitor_analyzer, trend_agent
- Added quality_gate fields for content_generator, deai_agent, geo_optimizer
- Updated custom_handler paths to configs.geo_handlers
- geo_tools.py: 14 FunctionTools calling GEO Backend via HTTP
- geo_handlers.py: 3 custom handlers (citation/monitor/schema) calling /internal/ API
- geo_server.py: FastAPI factory with LLM Gateway, Tool Registry, Skill Registry
2026-06-05 23:25:14 +08:00