fischer-agentkit

Commit Graph

Author	SHA1	Message	Date
chiguyong	fef7ecea39	feat(skills): SkillHarness 激活前置条件 + 风险守卫学习基于 SkillHarness 论文（arXiv:2606.20636）与 Agent Skills 综述（arXiv:2602.12430）引入激活前置条件（preconditions）与来源标记（provenance），并新增从失败轨迹学习风险守卫建议的能力。变更内容： - U1: SkillConfig 新增 v7 preconditions/provenance 字段（base.py） - U2: build_skill_system_prompt 注入 preconditions 软检查段落 - U3: SkillLoader 三路径记录 provenance + entry_points 危险能力告警 - U4: 10 个业务 Skill YAML 补充 preconditions（2-4 条中文短句） - U5: RiskGuardLearner 从失败轨迹学习风险守卫建议（人工审查，不自动应用） - U6: CLI 命令 agentkit skill learn-risk-guards 关键决策： - KTD1: preconditions 通过 system_prompt 注入（软检查），不做硬 LLM 调用 - KTD2: RiskGuardLearner 不自动应用，需人工审查（论文显示 75% 自动学习不安全） - KTD3: provenance 为轻量字符串，不加 hash/签名（无合规需求）测试：39 个新增单元测试全部通过，ruff 检查通过。	2026-06-24 13:56:37 +08:00
chiguyong	cac9c73dd5	fix(routing): U1-U6 路由优化 + 修复方案 + 代码审查修复实现 6 个修复单元（U1-U6）并应用 ce-code-review 发现的 5 项安全修复。 ## U1: benchmark 超时阈值 - 按 difficulty 分级超时：easy=45s, medium=60s, hard=90s - 替换原单一 60s 硬编码 ## U2: OpenAICompatibleProvider httpx 超时 - 新增 timeout 参数（默认 120s），替换硬编码 60s - ProviderConfig.timeout 透传到 Provider - 新增 2 项单元测试 ## U3: 激活 QualityGate skill_match 校验 - BaseAgent._build_skill_context() 构造 skill_context - 在 base.py / tasks.py / runner.py 三处传入 QualityGate.validate() ## U4: 添加 disambiguation_keywords 字段 - IntentConfig 新增 disambiguation_keywords 字段 - 8 个 skill YAML 补充该字段 ## U5: 优化 RequestPreprocessor 路由正则 - 拆分 _FACTUAL_RE 为 CN/EN 双正则（中文无空格） - 新增 _MATH_RE / _TRANSLATION_RE 纯模式 - _TOOL_CONTEXT_RE 排除需要工具的实时查询 - 多行输入守卫 + 结尾标点支持 - 新增 21 项单元测试（共 40 项全通过） ## U6: 重新基准测试 - 真实 LLM benchmark：准确率 60% -> 93.3% - 4/5 通过，p50=40.8s，一致性=100% - 旧基线备份至 baseline_2026-06-17_old_arch.json ## ce-code-review 修复（5 项） - 修复 \s 字符类匹配换行符的安全隐患 - 添加事实/数学正则的结尾标点支持 - 修复 geo_optimizer.yaml 关键词重复 - 修复 _login_with_retry 不可达 return - 修复 real_llm_server fixture stderr_fh 资源泄漏测试：tests/unit/chat/ 63 项全通过，ruff 检查通过。	2026-06-20 19:31:49 +08:00
chiguyong	11e2009cb8	feat(router): improve colloquial/mixed-lang routing, fix low-complexity IntentRouter bypass Key improvements: - Low-complexity queries (<0.3) now try IntentRouter keyword match before falling back to DIRECT_CHAT, fixing 0% F1 on keyword_match - SemanticRouter similarity_low lowered from 0.6 to 0.4 - Short text (<20 chars) uses effective_low = max(0.25, low - 0.15) - Short text with no semantic match forces LLM classify fallback - Added colloquial keywords to 7 skill YAMLs - Fixed code_reviewer.yaml output_schema placement - Fixed SemanticRouter build in e2e tests - Fixed base_url detection for bailian-coding API keys Results: keyword_match F1 0->60.87%, colloquial F1 0->100%, mixed_lang F1 0->100%	2026-06-15 23:54:57 +08:00
chiguyong	6731d96c65	feat(configs): add code_reviewer skill and coding_harness pipeline - code_reviewer.yaml: Verifier Agent skill config for adversarial review with structured output schema for ReviewFeedback format - coding_harness.yaml: Example pipeline with adversarial loop develop → test → review (Worker↔Verifier) → archive	2026-06-12 09:38:37 +08:00

4 Commits