Commit Graph

301 Commits

Author SHA1 Message Date
chiguyong 6731d96c65 feat(configs): add code_reviewer skill and coding_harness pipeline
- code_reviewer.yaml: Verifier Agent skill config for adversarial review
  with structured output schema for ReviewFeedback format
- coding_harness.yaml: Example pipeline with adversarial loop
  develop → test → review (Worker↔Verifier) → archive
2026-06-12 09:38:37 +08:00
chiguyong dc07c7c60a feat(pipeline): implement adversarial loop execution logic
Add Worker-Verifier adversarial loop to PipelineEngine:
- _execute_stage_with_adversarial: main loop for Worker→Verifier→retry
- _execute_agent_stage: extracted agent execution logic
- _execute_verifier: execute verifier and parse ReviewFeedback
- _build_feedback_context: build feedback context for worker retry
- _escalate: handle round exhaustion (escalate or fail)
- Route to adversarial mode when stage.verifier is configured

Support three feedback modes: structured+natural, structured, natural
2026-06-12 09:37:30 +08:00
chiguyong b733b3a732 feat(pipeline): add adversarial loop schema models
Add ReviewIssue, ReviewFeedback, AdversarialState models and extend
PipelineStage with verifier, max_adversarial_rounds, feedback_mode,
and escalate_on_exhaust fields for Worker-Verifier adversarial loop.
2026-06-12 09:35:01 +08:00
chiguyong 2110c84fb6 fix: switch default model to qwen3-coder-plus for better function calling
DeepSeek-chat has limited/partial function calling support. Qwen3-coder-plus
(DashScope) has robust OpenAI-compatible function calling.

Also added tool usage instructions to system prompt and enhanced logging
to trace tool propagation through the pipeline.
2026-06-12 09:27:52 +08:00
chiguyong 44f19fcf14 feat: loading animation + tool descriptions in system prompt
1. Loading indicator: three-dot bouncing animation appears after
   sending a message and disappears when server starts responding.

2. Tool descriptions: resolve_skill_routing now appends available
   tools (name + description + parameters) to the system prompt so
   the LLM knows what tools it can call.
2026-06-11 22:25:21 +08:00
chiguyong 55421dd126 fix: get_tools() and get_system_prompt() now read from tool_registry too
Root cause: app.py registers tools via agent._tool_registry.register()
which adds to the ToolRegistry but NOT to agent._tools (which is only
populated by use_tool() from config). Both get_tools() and
get_system_prompt() were reading only _tools, missing all post-init
registered tools. Now both methods merge _tools with
_tool_registry.list_tools().
2026-06-11 22:17:14 +08:00
chiguyong f7225bc91a fix: include available tools in system prompt so LLM knows what it can call
Previously get_system_prompt() only returned identity/instructions but
did not tell the LLM what tools are available. The LLM would therefore
refuse to call tools even when they were registered, saying it had no
tools. Now the system prompt includes a '## 可用工具' section listing
all registered tools with their descriptions and parameters.
2026-06-11 22:00:30 +08:00
chiguyong b6ec13cbca debug: log tools count and names in portal before execute_stream 2026-06-11 21:44:20 +08:00
chiguyong 32c800d1e4 fix: portal routing + response speed + IME input
1. Portal unified routing: ws_chat now uses CostAwareRouter uniformly
   (handles Layer 0/1/2), replacing direct IntentRouter calls.
   Greeting/chat_mode requests skip IntentRouter LLM call entirely.

2. Response speed: greeting & simple chat now use direct LLM call
   (no ReAct loop), zero-cost Layer 0 detection.

3. IME input fix: use e.isComposing (native browser property)
   instead of compositionstart/end for Enter key detection.

4. Test: fix InMemoryMessageBus.request() parameter name
   timeout -> timeout_seconds.
2026-06-11 21:30:25 +08:00
chiguyong ae95b56465 fix: use e.isComposing for IME detection instead of manual flag
e.isComposing is a standard KeyboardEvent property that's true during
IME composition. More reliable than compositionstart/compositionend
which can fire at unpredictable timing relative to keydown.
2026-06-11 20:43:38 +08:00
chiguyong 66d0901938 fix: prevent Enter from submitting during IME composition
Added compositionstart/compositionend event listeners to track IME
composing state. Enter key now only submits when not composing,
so Chinese/Japanese/Korean input methods work correctly.
2026-06-11 15:37:06 +08:00
chiguyong cc4c6fe346 fix: direct-mode agent falls through to default when task needs tools
When IntentRouter matches a direct-mode agent (no tools), but the task
content suggests tool needs (shell, search, file ops, etc.), the routing
now falls through to the default agent which has full tool access.

This fixes the issue where "帮我执行个命令" would be routed to
direct_agent and fail because direct mode doesn't support tool calling.

Also restored "你好" in direct_agent keywords since it's correctly
handled now — greetings don't need tools, direct mode is fine.
2026-06-11 15:26:19 +08:00
chiguyong 52b7d6007d fix: remove '你好' from direct_agent keywords so greetings route to default agent with tools 2026-06-11 14:49:59 +08:00
chiguyong 93bc7c4e3e fix: change all agent YAML model from hardcoded provider to 'default'
Hardcoded model names like 'openai/gpt-4o-mini' or 'anthropic/claude-sonnet'
cause 'No provider available' errors when the specific provider isn't configured.
Using 'default' lets the system pick the available provider automatically.
2026-06-11 14:19:26 +08:00
chiguyong d47f279887 fix: resolve code review issues from deferred improvements
1. InMemoryMessageBus.request(): fix param name (timeout→timeout_seconds) to match ABC
2. InMemoryMessageBus: track consumer tasks, cancel on unsubscribe
3. InMemoryMessageBus: _try_resolve_pending() in queue consumer path
4. evolve_soul(): use "default" category when patterns is empty
5. quick_classify(): use delimiter-based prompt to mitigate injection risk
6. Use asyncio.get_running_loop() instead of deprecated get_event_loop()
2026-06-11 13:49:02 +08:00
chiguyong ec51dbb259 feat: optimize劣势项 — 拍卖开关/审计采样/线程安全/评分锚定
1. 拍卖机制: 已有配置开关(marketplace.auction_enabled), 默认关闭
2. LLM审计采样: 新增 audit_sample_rate (0.0-1.0), 默认1.0, 可降低审计频率
3. AlignmentConfig.from_dict: 忽略未知键, 防止YAML额外字段崩溃
4. 配置热重载线程安全: 用 threading.Event 替代布尔标志, 消除数据竞态
5. Reflexion评分锚定: 添加评分维度(Completeness/Correctness/Clarity)和锚定点
2026-06-11 13:04:36 +08:00
chiguyong cc2cd414c9 fix: resolve all code review issues from cross-validation
1. Critical: Add missing TaskResult import in plan_exec_engine.py
2. Critical: Fix ReWOOEngine param name (max_steps → max_plan_steps)
3. Major: Remove duplicate token counting in reflexion.py
4. Major: LLM audit failure now passes (trusts rule check) instead of failing
5. Major: Fix dict iteration with del using list() copy in lifecycle.py
6. Major: Fix Chinese content tokenization using regex split instead of space split
7. Minor: _is_positive_mention now checks all occurrences, not just the first
2026-06-11 06:22:35 +08:00
chiguyong 79eb8469f9 fix: address remaining code review issues
- AlignmentGuard: direction-aware constraint checking (negation/affirmation detection)
  instead of simple substring matching to reduce false positives
- Reflexion: extract actual token usage from LLM response instead of hardcoded 1
- MemoryTool: protect version/history sections from update_soul modification
- Fix AsyncMock warnings for sync find_best_agent method
2026-06-11 00:14:11 +08:00
chiguyong 5171e942d6 feat: multi-agent marketplace architecture evolution
Phase A: ReWOO, PlanExec, Reflexion engines + SkillConfig extension
Phase B: CostAwareRouter, OrganizationContext, AlignmentGuard
Phase C: Soul evolution, Auction mechanism, Server integration

250 tests passing across all units.
2026-06-10 23:58:06 +08:00
chiguyong bba394be38 fix(marketplace): address code review findings
- Fix str.format() crash when user input contains curly braces
- Fix Layer 2 passing str to find_best_agent (expects list[str])
- Fix AlignmentGuard fail-open on LLM audit failure (now fail-closed)
- Fix _config_reload_lock not initialized in create_app()
- Fix evolve_soul redundant reflector.reflect() call (reuse existing reflection)
- Fix test mocks using AsyncMock for sync find_best_agent method
- Remove unused _COMPLEXITY_CLASSIFY_PROMPT constant
2026-06-10 19:21:40 +08:00
chiguyong 8713636d50 feat(marketplace): add Phase B/C - CostAwareRouter, OrganizationContext, AlignmentGuard, Soul Evolution, Auction, Server Integration
Phase B:
- U1: CostAwareRouter with 3-layer routing (rule/LLM/capability matching)
- U6: OrganizationContext with agent profiles and capability-based discovery
- U7: AlignmentGuard with constraint injection and cascade detection

Phase C:
- U8: Soul dynamic evolution with version tracking and reflection-triggered updates
- U9: Auction mechanism as optional advanced routing mode
- U10: Server integration + end-to-end integration tests

250 new tests passing across all units.
2026-06-10 19:09:02 +08:00
chiguyong 5b42487d8a feat(core): add ReWOO, Plan-and-Execute, Reflexion execution engines
Phase A of Multi-Agent Marketplace architecture:
- ReWOOEngine: plan-all-then-execute pattern for parallel data fetch
- PlanExecEngine: adapter wrapping GoalPlanner+PlanExecutor+PipelineReplanner
- ReflexionEngine: ReAct + Evaluate + Reflect + Retry for high-precision tasks
- SkillConfig: extend VALID_EXECUTION_MODES with rewoo/plan_exec/reflexion
- ConfigDrivenAgent: add _handle_rewoo/_handle_plan_exec/_handle_reflexion routes
- 5 professional agent YAML configs with layered model defaults
- 107 unit tests passing
2026-06-10 17:08:48 +08:00
chiguyong 6852dfe892 fix(security,reliability): resolve all P2 findings from code review 2026-06-10 15:05:40 +08:00
chiguyong 658e188939 fix(review): resolve P0/P1 findings from final code review 2026-06-10 09:57:29 +08:00
chiguyong 1d1805753c fix: resolve key P2 findings from code review
- Shell whitelist: use exact binary match instead of startswith
- Shell audit log: use deque(maxlen=10000) to cap memory
- Terminal history: use deque(maxlen) for O(1) eviction
- Path optimizer: cap _pending_paths at 50 entries per task_type
- Pitfall detector: only add tips to matching steps, not all
- Experience store: handle non-numeric _parse_time_window input
- Extract shared is_safe_url() to utils/security.py (DRY)
- Workflow condition evaluator: handle float() ValueError
2026-06-10 09:01:23 +08:00
chiguyong b46a10973f fix(tests): clean up test_shell_tool.py lint issues 2026-06-10 08:46:35 +08:00
chiguyong 9646b0f0dd fix(tests): update test_shell_tool.py to match new ShellTool API 2026-06-10 08:22:15 +08:00
chiguyong 7874e875af merge: integrate feat/agentkit-phase8-chat-adaptive (chat/gui commands + GUI mode)
Restores agentkit chat, agentkit gui CLI commands, onboarding wizard,
and GUI mode (AGENTKIT_GUI_MODE) with static file serving.
Resolves merge conflicts in orchestrator.py, app.py, tools/__init__.py, shell.py.
2026-06-10 07:44:06 +08:00
chiguyong 9e9f1314f6 fix(security): resolve all P0/P1 findings from code review 2026-06-10 07:12:41 +08:00
chiguyong b34f74f598 feat(phase6): implement end-to-end enterprise scenario validation (U15)
- Add goal-driven agent skill config and pipeline config
- Add 9 E2E integration tests covering all 7 capabilities:
  - SC1: Goal-driven SEO analysis (GoalPlanner→PlanExecutor→PlanChecker→ExperienceStore)
  - SC2: Knowledge Q&A with system operation (MultiSourceRAG)
  - SC3: Workflow with approval (WorkflowStore + approval node)
  - SC4: Self-evolution experience accumulation (ExperienceStore→PitfallDetector→PathOptimizer)
  - SC5: Parallel execution efficiency verification
  - SC6: Skill registry integration (capabilities, versions, health)
  - Cross-capability: Plan+Experience+Pitfall, Review+Experience, RAG+Workflow
- All 2472 tests passing (9 integration + 2463 unit)
2026-06-10 01:38:28 +08:00
chiguyong c606ffa64a feat(phase5): implement management pages, evolution dashboard, and workflow editor (U13b/U13c/U14) 2026-06-10 01:29:01 +08:00
chiguyong a1deeecede feat(phase5): implement Vue3 portal foundation with chat interface and routing (U13a)
- Add Portal API routes: chat, stream, capabilities, conversations, WebSocket
- Add ConversationStore for in-memory conversation management
- Add CAPABILITY_CATEGORIES mapping for 8 capability types
- Create Vue3 SPA with TypeScript, Pinia, Vue Router, Ant Design Vue
- Implement ChatView with message bubbles, input, sidebar, WebSocket support
- Add side navigation skeleton for all 8 capability sections
- Add placeholder views for workflow, knowledge, skills, terminal, etc.
- 31 backend tests passing
2026-06-10 01:06:48 +08:00
chiguyong 901e4d9d0a feat(phase4): implement Computer Use integration (U12)
- ComputerUseTool: Anthropic API + fallback chain (API→Session→ShellTool→AskHuman)
- ComputerUseSession: Docker sandbox + InMemory test session
- ComputerUseRecorder: action recording, replay, and persistence

89 new tests passing. Degradation chain verified.
2026-06-10 00:54:31 +08:00
chiguyong c99aee1423 feat(phase3): implement knowledge base and RAG enhancement (U9-U11)
- U9: LocalDocumentIngestion - multi-format doc parsing and chunking
- U10: ExternalKBAdapters - Feishu/Confluence/GenericHTTP adapters
- U11: MultiSourceRAG - multi-source retrieval with source tracing

KnowledgeBase protocol defined (KTD-7). 145 new tests passing.
2026-06-10 00:45:17 +08:00
chiguyong e3d4f811dd feat(phase2): implement self-evolution and smart terminal (U6-U8)
- U6: PitfallDetector - detect historical failure patterns and warn
- U7: PathOptimizer - discover and update optimal execution paths
- U8: TerminalSession - session state, PTY interactive, output parsing

160 new tests passing. ShellTool enhanced with session_id support.
2026-06-10 00:22:36 +08:00
chiguyong fd4a811929 feat(phase1): implement core kernel and experience foundation (U1-U5)
- U1: GoalPlanner - structured goal decomposition wrapping _decompose_task()
- U2: PlanExecutor - parallel execution with retry/skip/replace strategies
- U3: PlanChecker - quality gate + review + experience writing
- U4: Skill spec upgrade - dependencies, capabilities, version management
- U5: ExperienceStore - PostgreSQL+pgvector task experience storage

208 new tests passing, fully backward compatible.
2026-06-09 23:57:03 +08:00
chiguyong 31bd3b126c feat(phase8): chat adaptive enhancements, pipeline reflection, search tools upgrade
- Enhanced chat CLI with adaptive mode and session management
- Added pipeline reflection and schema extensions
- Upgraded BaiduSearch and WebSearch tools with advanced capabilities
- Expanded server routes for skills and chat
- Added session store enhancements
- New chat module and pipeline reflection support
2026-06-09 23:18:06 +08:00
chiguyong 045fecd4ce feat(tools): add ShellTool + WebSearchTool, memory system, onboarding wizard, chat mode
- ShellTool: safe command execution with allowlist, blocked patterns (regex), timeout, output truncation
- WebSearchTool: multi-backend search with Tavily → Serper → DuckDuckGo Lite fallback
- MemoryTool: agent-callable tool with add/replace/remove/read actions
- MemoryStore/MemoryFile: file-based memory (SOUL.md, USER.md, MEMORY.md, DAILY.md)
- Onboarding wizard: provider selection, API key, model selection, agent personality
- Chat mode: interactive CLI with streaming, memory injection, tool integration
- Add 百炼 Coding Plan provider with 10 models
- 102 unit tests (34 new for ShellTool + WebSearchTool)
2026-06-09 01:06:45 +08:00
chiguyong 9874a4aac0 test: add Phase 8 integration tests for Chat + Adaptive + Multi-Agent (U8)
End-to-end integration tests covering session lifecycle, adaptive pipeline,
multi-agent communication via MessageBus, and config serialization.
2026-06-08 01:17:04 +08:00
chiguyong 45283d31e8 feat(core): integrate MessageBus into Orchestrator and AgentPool (U7)
- Orchestrator accepts optional message_bus parameter; workers publish
  task.progress messages via MessageBus after each subtask execution
- AgentPool accepts optional message_bus; auto-registers agents on
  create and auto-unregisters on remove
- app.py initializes MessageBus from config and injects into AgentPool
- ServerConfig adds bus configuration field
- 5 new tests, all passing
2026-06-08 00:03:40 +08:00
chiguyong 13d6e74099 feat(bus): add MessageBus abstraction layer with InMemory + Redis Streams (U6)
- AgentMessage: message model with sender/recipient/topic/payload/correlation_id
- MessageBus Protocol: publish/subscribe/unsubscribe/request/broadcast/health_check
- InMemoryMessageBus: asyncio.Queue-based implementation for testing
- RedisMessageBus: Redis Streams (XADD/XREADGROUP) implementation with
  consumer groups, message acknowledgment, and dead letter queue
- create_message_bus() factory with graceful Redis→InMemory fallback
- Request-response pattern via correlation_id + asyncio.Future
- 13 new tests, all passing
2026-06-07 23:58:16 +08:00
chiguyong 88d8298871 feat(core): add Orchestrator adaptive task decomposition (U5)
- execute_adaptive(): iterative execute→evaluate→re-decompose loop
- OrchestratorConfig: adaptive, max_iterations, quality_threshold
- _evaluate_quality(): LLM-based or rule-based quality scoring (0-1)
- _reexecute_failed(): preserves completed subtask results, retries
  failed ones with improvement feedback injected into input_data
- OrchestrationResult.metadata field for tracking iteration history
- 10 new tests, all passing
2026-06-07 23:50:54 +08:00
chiguyong 7054ac02b6 feat(tools): add AskHumanTool + token streaming in ReAct execute_stream
- AskHumanTool: Human-in-the-Loop tool for Chat mode, pushes questions
  via WebSocket callback and waits for user reply via asyncio.Future
- Token streaming: execute_stream() now uses chat_stream() instead of
  chat(), yielding token-type ReActEvents for each StreamChunk
- _build_response_from_stream() static method constructs LLMResponse
  from accumulated stream data
- Export AskHumanTool from tools/__init__.py
- 12 new tests (7 AskHumanTool + 5 token streaming), all passing
2026-06-07 23:40:43 +08:00
chiguyong 6013d5189b feat(chat): add Chat API routes with REST + WebSocket bidirectional communication 2026-06-07 22:49:26 +08:00
chiguyong 493187782c feat(session): add Session/Message models and SessionManager with InMemory/Redis stores 2026-06-07 22:43:14 +08:00
chiguyong e4d6efb4bf Merge feat/agentkit-phase7-headroom: Phase 6-7 + all review fixes 2026-06-07 22:05:34 +08:00
chiguyong b34b06724d fix(agentkit): resolve all P0/P1/P2/P3 issues from code review 2026-06-07 22:05:18 +08:00
chiguyong 3645c7a080 docs: mark Phase 7 Headroom integration plan as completed 2026-06-07 18:21:27 +08:00
chiguyong bad66445ff feat(compression): U6 GEO Pipeline compression integration tests and config
Add GEO Pipeline end-to-end compression integration tests with
MockHeadroomCompressor. Add compression configuration section to
llm_config.yaml with headroom and summary mode examples.
2026-06-07 18:20:41 +08:00
chiguyong 9c04362dba feat(compression): U5 HeadroomRetrieveTool for CCR cache retrieval
Add HeadroomRetrieveTool that allows LLM to retrieve original
uncompressed data from CCR cache via Function Calling. Auto-registered
when HeadroomCompressor is active and available.
2026-06-07 18:20:17 +08:00