- Settings API: reverse-resolve env vars to preserve ${VAR} refs in yaml,
write new API keys to .env instead of agentkit.yaml, extract env_key
from existing ${VAR} reference when updating providers
- Onboarding: merge-update instead of overwrite when config exists,
use config_arg to determine output path, .env merge instead of overwrite
- Unified templates: bailian-coding provider name, full model_aliases,
docker-compose with postgres, expanded .env.example
- Optional ruamel.yaml for comment/format preservation in Settings API
- clients.yaml: add _deep_resolve for ${VAR} env var references
- All CLI commands use load_config_with_dotenv() consistently
- Tests: mock find_config_path and CWD auto-discovery to avoid env leaks
Key improvements:
- Low-complexity queries (<0.3) now try IntentRouter keyword match
before falling back to DIRECT_CHAT, fixing 0% F1 on keyword_match
- SemanticRouter similarity_low lowered from 0.6 to 0.4
- Short text (<20 chars) uses effective_low = max(0.25, low - 0.15)
- Short text with no semantic match forces LLM classify fallback
- Added colloquial keywords to 7 skill YAMLs
- Fixed code_reviewer.yaml output_schema placement
- Fixed SemanticRouter build in e2e tests
- Fixed base_url detection for bailian-coding API keys
Results: keyword_match F1 0->60.87%, colloquial F1 0->100%, mixed_lang F1 0->100%
Critical:
- C1: Add verifier_timeout_seconds for independent Verifier timeout
- C2: Verifier parse failure raises RuntimeError instead of dead-loop
Major:
- M1: Inject previous_output into Worker retry context
- M2: Add Pydantic ge/le constraint on ReviewFeedback.score
- M3: Use Literal type for feedback_mode enum validation
- M4: Use Literal types for ReviewIssue severity and category
- M5: Merge error messages when escalation agent also fails
Tests: 8 new test cases added (19 total), all passing
- code_reviewer.yaml: Verifier Agent skill config for adversarial review
with structured output schema for ReviewFeedback format
- coding_harness.yaml: Example pipeline with adversarial loop
develop → test → review (Worker↔Verifier) → archive
DeepSeek-chat has limited/partial function calling support. Qwen3-coder-plus
(DashScope) has robust OpenAI-compatible function calling.
Also added tool usage instructions to system prompt and enhanced logging
to trace tool propagation through the pipeline.
When IntentRouter matches a direct-mode agent (no tools), but the task
content suggests tool needs (shell, search, file ops, etc.), the routing
now falls through to the default agent which has full tool access.
This fixes the issue where "帮我执行个命令" would be routed to
direct_agent and fail because direct mode doesn't support tool calling.
Also restored "你好" in direct_agent keywords since it's correctly
handled now — greetings don't need tools, direct mode is fine.
Hardcoded model names like 'openai/gpt-4o-mini' or 'anthropic/claude-sonnet'
cause 'No provider available' errors when the specific provider isn't configured.
Using 'default' lets the system pick the available provider automatically.
- Enhanced chat CLI with adaptive mode and session management
- Added pipeline reflection and schema extensions
- Upgraded BaiduSearch and WebSearch tools with advanced capabilities
- Expanded server routes for skills and chat
- Added session store enhancements
- New chat module and pipeline reflection support