fischer-agentkit

Commit Graph

Author	SHA1	Message	Date
chiguyong	fef7ecea39	feat(skills): SkillHarness 激活前置条件 + 风险守卫学习基于 SkillHarness 论文（arXiv:2606.20636）与 Agent Skills 综述（arXiv:2602.12430）引入激活前置条件（preconditions）与来源标记（provenance），并新增从失败轨迹学习风险守卫建议的能力。变更内容： - U1: SkillConfig 新增 v7 preconditions/provenance 字段（base.py） - U2: build_skill_system_prompt 注入 preconditions 软检查段落 - U3: SkillLoader 三路径记录 provenance + entry_points 危险能力告警 - U4: 10 个业务 Skill YAML 补充 preconditions（2-4 条中文短句） - U5: RiskGuardLearner 从失败轨迹学习风险守卫建议（人工审查，不自动应用） - U6: CLI 命令 agentkit skill learn-risk-guards 关键决策： - KTD1: preconditions 通过 system_prompt 注入（软检查），不做硬 LLM 调用 - KTD2: RiskGuardLearner 不自动应用，需人工审查（论文显示 75% 自动学习不安全） - KTD3: provenance 为轻量字符串，不加 hash/签名（无合规需求）测试：39 个新增单元测试全部通过，ruff 检查通过。	2026-06-24 13:56:37 +08:00
chiguyong	cac9c73dd5	fix(routing): U1-U6 路由优化 + 修复方案 + 代码审查修复实现 6 个修复单元（U1-U6）并应用 ce-code-review 发现的 5 项安全修复。 ## U1: benchmark 超时阈值 - 按 difficulty 分级超时：easy=45s, medium=60s, hard=90s - 替换原单一 60s 硬编码 ## U2: OpenAICompatibleProvider httpx 超时 - 新增 timeout 参数（默认 120s），替换硬编码 60s - ProviderConfig.timeout 透传到 Provider - 新增 2 项单元测试 ## U3: 激活 QualityGate skill_match 校验 - BaseAgent._build_skill_context() 构造 skill_context - 在 base.py / tasks.py / runner.py 三处传入 QualityGate.validate() ## U4: 添加 disambiguation_keywords 字段 - IntentConfig 新增 disambiguation_keywords 字段 - 8 个 skill YAML 补充该字段 ## U5: 优化 RequestPreprocessor 路由正则 - 拆分 _FACTUAL_RE 为 CN/EN 双正则（中文无空格） - 新增 _MATH_RE / _TRANSLATION_RE 纯模式 - _TOOL_CONTEXT_RE 排除需要工具的实时查询 - 多行输入守卫 + 结尾标点支持 - 新增 21 项单元测试（共 40 项全通过） ## U6: 重新基准测试 - 真实 LLM benchmark：准确率 60% -> 93.3% - 4/5 通过，p50=40.8s，一致性=100% - 旧基线备份至 baseline_2026-06-17_old_arch.json ## ce-code-review 修复（5 项） - 修复 \s 字符类匹配换行符的安全隐患 - 添加事实/数学正则的结尾标点支持 - 修复 geo_optimizer.yaml 关键词重复 - 修复 _login_with_retry 不可达 return - 修复 real_llm_server fixture stderr_fh 资源泄漏测试：tests/unit/chat/ 63 项全通过，ruff 检查通过。	2026-06-20 19:31:49 +08:00
chiguyong	ee6d16345c	feat(experts): U7 新增 5 个编程专家模板 + dev_team 团队模板 + ExpertTeamRouter 模板展开	2026-06-18 01:50:43 +08:00
chiguyong	dddcbd24e3	feat: 私董会讨论模式 + 回测集成 + WS持久化修复私董会讨论模式 (Board Meeting Mode): - BoardRouter: @board 前缀路由, 专家名验证, 模板回退 - BoardTeam: 讨论容器, 状态机 (FORMING->DISCUSSING->CONCLUDING->COMPLETED) - BoardOrchestrator: 多轮自主循环讨论引擎, 主持人小结, 停止命令检测 - 9个预设名人专家 YAML (马斯克/贝佐斯/张小龙/芒格等) - 前端 BoardStatusView 群聊式 UI + WebSocket 事件处理 - 后端 chat.py 集成 @board 路由到主聊天流程回测集成: - benchmark.py: 新增 board_meeting 维度 (18 tasks, 6 categories) - benchmark_dataset.py: 新增 BOARD_BENCHMARKS (11 E2E cases) - test_board_backtest.py: 66 个回测测试 (9 test classes) Bug 修复: - resolve_expert_configs: deep-copy 防止 is_lead 修改污染共享模板 - 所有专家名无效时回退到默认模板 - board_router: 非匹配路径 topic 未 strip - benchmark_dataset: board-name-invalid-001 输入修正 WebSocket 持久化修复: - chat.py: 三层防御机制确保任务结果不丢失 - chat store: 断线恢复逻辑部署配置: - Gitea Actions CI/CD workflow - docker-compose.deploy.yaml 部署编排 - scripts/deploy.sh 自动化部署脚本测试结果: 120 单元测试通过, 71 benchmark 测试 100% 通过, ruff 全部通过	2026-06-17 23:52:53 +08:00
chiguyong	a1318df420	feat: add LLM and GUI benchmark modes with real agent testing	2026-06-17 12:55:19 +08:00
chiguyong	1fbfd9d132	refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline)	2026-06-17 12:01:34 +08:00
chiguyong	89a9534678	feat: add benchmark_runner skill for capability testing and report generation	2026-06-17 11:31:15 +08:00
chiguyong	a27eed3714	fix(config): unify config loading chain and protect ${VAR} references - Settings API: reverse-resolve env vars to preserve ${VAR} refs in yaml, write new API keys to .env instead of agentkit.yaml, extract env_key from existing ${VAR} reference when updating providers - Onboarding: merge-update instead of overwrite when config exists, use config_arg to determine output path, .env merge instead of overwrite - Unified templates: bailian-coding provider name, full model_aliases, docker-compose with postgres, expanded .env.example - Optional ruamel.yaml for comment/format preservation in Settings API - clients.yaml: add _deep_resolve for ${VAR} env var references - All CLI commands use load_config_with_dotenv() consistently - Tests: mock find_config_path and CWD auto-discovery to avoid env leaks	2026-06-16 00:26:54 +08:00
chiguyong	11e2009cb8	feat(router): improve colloquial/mixed-lang routing, fix low-complexity IntentRouter bypass Key improvements: - Low-complexity queries (<0.3) now try IntentRouter keyword match before falling back to DIRECT_CHAT, fixing 0% F1 on keyword_match - SemanticRouter similarity_low lowered from 0.6 to 0.4 - Short text (<20 chars) uses effective_low = max(0.25, low - 0.15) - Short text with no semantic match forces LLM classify fallback - Added colloquial keywords to 7 skill YAMLs - Fixed code_reviewer.yaml output_schema placement - Fixed SemanticRouter build in e2e tests - Fixed base_url detection for bailian-coding API keys Results: keyword_match F1 0->60.87%, colloquial F1 0->100%, mixed_lang F1 0->100%	2026-06-15 23:54:57 +08:00
chiguyong	64d62a2b60	feat: autonomous task execution - connect PlanExecEngine + TeamOrchestrator U1: TeamOrchestrator._execute_phase real execution (Expert.agent.execute) U2: LLM-based merge strategies (BEST/VOTE/FUSION) with fallback U3: ReActStepExecutor replacing _LLMStepAgent for tool-enabled steps U4: SharedWorkspace integration for cross-phase/cross-execution state U5: GoalPlanner prompt tuning with few-shot and verb pattern matching U6: Replan-before-fallback in TeamOrchestrator U7: End-to-end validation tests for multi-step research tasks U8: WebSocket progress events (step_event_callback + new event types) Code review fixes: P0 response.strip fix, P1 competitor status check, milestone real impl, VOTE self-bias fix, confirmation_handler wiring, ExpertTeam public API, DRY _build_result_summaries, replan tests Also: geo_server.py refactor (ServerConfig.from_yaml), delete llm_config.yaml	2026-06-15 12:41:32 +08:00
chiguyong	99fe4c99f7	fix: comprehensive code review fixes + WS test stability	2026-06-15 08:17:34 +08:00
chiguyong	5ef08a3b30	fix(review): comprehensive P0-P2 code review fixes	2026-06-12 22:18:25 +08:00
chiguyong	8c365486e2	fix(pipeline): address code review findings for adversarial loop Critical: - C1: Add verifier_timeout_seconds for independent Verifier timeout - C2: Verifier parse failure raises RuntimeError instead of dead-loop Major: - M1: Inject previous_output into Worker retry context - M2: Add Pydantic ge/le constraint on ReviewFeedback.score - M3: Use Literal type for feedback_mode enum validation - M4: Use Literal types for ReviewIssue severity and category - M5: Merge error messages when escalation agent also fails Tests: 8 new test cases added (19 total), all passing	2026-06-12 10:02:37 +08:00
chiguyong	6731d96c65	feat(configs): add code_reviewer skill and coding_harness pipeline - code_reviewer.yaml: Verifier Agent skill config for adversarial review with structured output schema for ReviewFeedback format - coding_harness.yaml: Example pipeline with adversarial loop develop → test → review (Worker↔Verifier) → archive	2026-06-12 09:38:37 +08:00
chiguyong	2110c84fb6	fix: switch default model to qwen3-coder-plus for better function calling DeepSeek-chat has limited/partial function calling support. Qwen3-coder-plus (DashScope) has robust OpenAI-compatible function calling. Also added tool usage instructions to system prompt and enhanced logging to trace tool propagation through the pipeline.	2026-06-12 09:27:52 +08:00
chiguyong	cc4c6fe346	fix: direct-mode agent falls through to default when task needs tools When IntentRouter matches a direct-mode agent (no tools), but the task content suggests tool needs (shell, search, file ops, etc.), the routing now falls through to the default agent which has full tool access. This fixes the issue where "帮我执行个命令" would be routed to direct_agent and fail because direct mode doesn't support tool calling. Also restored "你好" in direct_agent keywords since it's correctly handled now — greetings don't need tools, direct mode is fine.	2026-06-11 15:26:19 +08:00
chiguyong	52b7d6007d	fix: remove '你好' from direct_agent keywords so greetings route to default agent with tools	2026-06-11 14:49:59 +08:00
chiguyong	93bc7c4e3e	fix: change all agent YAML model from hardcoded provider to 'default' Hardcoded model names like 'openai/gpt-4o-mini' or 'anthropic/claude-sonnet' cause 'No provider available' errors when the specific provider isn't configured. Using 'default' lets the system pick the available provider automatically.	2026-06-11 14:19:26 +08:00
chiguyong	5b42487d8a	feat(core): add ReWOO, Plan-and-Execute, Reflexion execution engines Phase A of Multi-Agent Marketplace architecture: - ReWOOEngine: plan-all-then-execute pattern for parallel data fetch - PlanExecEngine: adapter wrapping GoalPlanner+PlanExecutor+PipelineReplanner - ReflexionEngine: ReAct + Evaluate + Reflect + Retry for high-precision tasks - SkillConfig: extend VALID_EXECUTION_MODES with rewoo/plan_exec/reflexion - ConfigDrivenAgent: add _handle_rewoo/_handle_plan_exec/_handle_reflexion routes - 5 professional agent YAML configs with layered model defaults - 107 unit tests passing	2026-06-10 17:08:48 +08:00
chiguyong	7874e875af	merge: integrate feat/agentkit-phase8-chat-adaptive (chat/gui commands + GUI mode) Restores agentkit chat, agentkit gui CLI commands, onboarding wizard, and GUI mode (AGENTKIT_GUI_MODE) with static file serving. Resolves merge conflicts in orchestrator.py, app.py, tools/__init__.py, shell.py.	2026-06-10 07:44:06 +08:00
chiguyong	b34f74f598	feat(phase6): implement end-to-end enterprise scenario validation (U15) - Add goal-driven agent skill config and pipeline config - Add 9 E2E integration tests covering all 7 capabilities: - SC1: Goal-driven SEO analysis (GoalPlanner→PlanExecutor→PlanChecker→ExperienceStore) - SC2: Knowledge Q&A with system operation (MultiSourceRAG) - SC3: Workflow with approval (WorkflowStore + approval node) - SC4: Self-evolution experience accumulation (ExperienceStore→PitfallDetector→PathOptimizer) - SC5: Parallel execution efficiency verification - SC6: Skill registry integration (capabilities, versions, health) - Cross-capability: Plan+Experience+Pitfall, Review+Experience, RAG+Workflow - All 2472 tests passing (9 integration + 2463 unit)	2026-06-10 01:38:28 +08:00
chiguyong	31bd3b126c	feat(phase8): chat adaptive enhancements, pipeline reflection, search tools upgrade - Enhanced chat CLI with adaptive mode and session management - Added pipeline reflection and schema extensions - Upgraded BaiduSearch and WebSearch tools with advanced capabilities - Expanded server routes for skills and chat - Added session store enhancements - New chat module and pipeline reflection support	2026-06-09 23:18:06 +08:00
chiguyong	bad66445ff	feat(compression): U6 GEO Pipeline compression integration tests and config Add GEO Pipeline end-to-end compression integration tests with MockHeadroomCompressor. Add compression configuration section to llm_config.yaml with headroom and summary mode examples.	2026-06-07 18:20:41 +08:00
chiguyong	2e547e345a	feat(geo): U4 GEO skill tool binding with BaiduSearch and E2E tests Add BaiduSearchTool (API mode + scraping fallback), bind tools to GEO skill YAML configs (baidu_search, web_crawl, schema_extract, schema_generate), extend geo_full_pipeline with generate_content and deai steps, add 36 E2E integration tests.	2026-06-07 17:25:37 +08:00
chiguyong	1390bd8d6e	feat(skills): U5 GEO Pipeline orchestration with DAG execution - GEOPipeline: YAML-driven DAG pipeline with parallel/sequential execution - PipelineStep with input_mapping ($.input.xxx, $.steps.name.output.xxx) - Topological sort for execution groups, SharedWorkspace integration - geo_full_pipeline.yaml: detect→analyze→optimize→track workflow - 10 tests passing	2026-06-06 22:34:24 +08:00
chiguyong	f858d279f3	feat(agentkit): Phase 3 upgrade - persistence, memory, evolution, observability 10 Implementation Units across 3 phases: Phase A - Infrastructure: - U1: RedisTaskStore with Redis/memory backend + factory function - U2: TraceRecorder for execution trace recording - U3: PersistentEvolutionStore with SQLite backend Phase B - Core Capabilities: - U4: MemoryRetriever integration into ReAct engine - U5: Embedder abstraction + EpisodicMemory vector search - U6: LLMReflector for LLM-in-the-loop reflection - U7: SkillPipeline for multi-skill orchestration Phase C - Enhancement: - U8: SKILL.md format + progressive disclosure levels - U9: ContextCompressor + prompt cache rendering - U10: Structured logging + metrics endpoint + enhanced health check Tests: 924 passed, 18 skipped, 0 failed	2026-06-06 17:17:45 +08:00
chiguyong	2844eeb548	feat(streaming): Phase C - LLM streaming + ReAct events + SSE endpoint U8: StreamChunk protocol + OpenAI chat_stream + Gateway streaming with usage tracking U9: ReActEvent dataclass + execute_stream() yielding thinking/tool_call/tool_result/final_answer U10: POST /tasks/stream SSE endpoint + Client SDK stream_task() 15 new tests passing, no regression.	2026-06-06 11:54:17 +08:00
chiguyong	669ca604e5	feat(configs): add GEO AgentKit Server configuration - llm_config.yaml: DeepSeek + OpenAI-compatible providers with env var substitution - skills/ (8 YAML): citation_detector, content_generator, deai_agent, geo_optimizer, monitor, schema_advisor, competitor_analyzer, trend_agent - Added intent fields for content_generator, competitor_analyzer, trend_agent - Added quality_gate fields for content_generator, deai_agent, geo_optimizer - Updated custom_handler paths to configs.geo_handlers - geo_tools.py: 14 FunctionTools calling GEO Backend via HTTP - geo_handlers.py: 3 custom handlers (citation/monitor/schema) calling /internal/ API - geo_server.py: FastAPI factory with LLM Gateway, Tool Registry, Skill Registry	2026-06-05 23:25:14 +08:00

28 Commits