Commit Graph

226 Commits

Author SHA1 Message Date
chiguyong 16c33be295 feat(mcp): U13 — refactor MCPServer to route factory + mount at /api/v1/mcp with auth 2026-06-25 20:58:41 +08:00
chiguyong 8998f94c42 feat(channels): U12 — DingTalk/WeCom/Slack adapters + multi-channel webhook dispatch 2026-06-25 20:45:43 +08:00
chiguyong 4b58e8f661 feat(channels): U11 — Feishu IM adapter end-to-end (webhook + signature + AES-CBC decrypt + chat integration) 2026-06-25 20:24:21 +08:00
chiguyong 5572387c01 feat(channels): U10 — message adapter ABC + AES-256-GCM secrets store + channel CRUD routes 2026-06-25 20:13:37 +08:00
chiguyong e3ae2f3a56 feat(rag_platform): U8 — TaskIQ async task integration
Add tasks.py: TaskManager with vectorize/batch_index tasks, per-user concurrency limits, degraded mode (sync execution without broker), WorkerSweeper for timeout detection, error message sanitization
Add taskiq>=0.11 and taskiq-redis>=0.5 to pyproject.toml
Task parameter schema validation (VectorizeTaskParams, BatchIndexTaskParams)

Tests: 41 new tests, 289 total passing
2026-06-25 12:58:51 +08:00
chiguyong d026a91f43 feat(rag_platform): U6 — hit processing mode + KB settings
Add hit_processing.py: HitProcessor with model_opt (LLM-generated) and direct (concatenated chunks) modes, with in-process cache
Add settings.py: KBSettings/KBSettingsUpdate models, KBSettingsStore with async CRUD
Add KB settings endpoints to kb_management.py: GET/PUT /kb-management/kbs/{kb_id}/settings with owner-only modification

Tests: 43 new tests (25 hit_processing + 18 settings), 293 total passing
2026-06-25 12:44:47 +08:00
chiguyong 5c562dbff3 feat(rag_platform): U5 — rerank + question generation + termbase
Add rerank.py: Reranker with Cohere/BGE provider support, data export risk annotation, graceful degradation
Add question_gen.py: LLM-based question generation following ContextualChunker pattern, with caching
Add termbase.py: jieba custom dictionary management, add/remove/load terms

Tests: 58 new tests (14 rerank + 19 question_gen + 25 termbase), 205 total passing
2026-06-25 12:31:43 +08:00
chiguyong fb9f16d6e5 feat(rag_platform): U4 — dual-index retrieval (pgvector semantic + PG fulltext jieba)
Add fulltext.py: jieba tokenization + tsvector write/query
Add retrieval.py: RetrievalEngine with embedding/keywords/blend modes
Update models.py: add RetrievalRequest model
Tests: 35 new tests, 147 total passing
2026-06-25 12:20:48 +08:00
chiguyong 3f9588e673 feat(rag_platform): U3+U7 — rewrite upload endpoint with sanitization + pipeline
Rewrite upload_document() to use rag_platform sanitize + DocumentProcessor:
- File type whitelist validation (8 allowed types, reject .exe/.sh)
- File size limit (50MB) + zip bomb detection for ZIP-based formats
- DocumentProcessor.parse() (with content sanitization) + segment()
- Return chunks preview, status="segmenting" (pending vectorization)

Add POST /kb-management/documents/preview endpoint:
- Pre-upload preview with adjustable chunk_size/chunk_overlap
- Same security validation as upload, no document record created

Add POST /kb-management/documents/{id}/vectorize placeholder:
- Returns 503 — full async vectorization deferred to U8 (TaskIQ)

Test: update test_upload_document assertion (status "indexed" → "segmenting")
2026-06-25 12:06:16 +08:00
chiguyong b55c896794 feat(rag_platform): U3+U7 — document processing pipeline + upload security
U3: Document processing pipeline (document_processor.py)
- DocumentProcessor class wrapping parse → segment → vectorize
- parse() uses memory/document_loader.py for multi-format extraction
- segment() uses LlamaIndex SentenceSplitter
- preview() returns chunks for read-only preview (no vectorization)
- vectorize() embeds chunks and stores in pgvector (all-or-nothing)
- process() orchestrates full pipeline with status transitions:
  pending → parsing → segmenting → vectorizing → indexed | failed

U7: Upload security & content sanitization (sanitize.py)
- ALLOWED_FILE_TYPES whitelist (pdf/docx/xlsx/pptx/txt/md/csv/html)
- MAX_FILE_SIZE 50MB limit
- validate_file_type() / validate_file_size() guards
- check_zip_bomb() for ZIP-based formats (ratio > 100:1 or > 500MB)
- check_image_bomb() for pixel count > 100MP (PNG/JPEG/GIF header parsing)
- is_safe_ip() SSRF protection (loopback/RFC1918/link-local/ULA denied)
- sanitize_markdown() removes dangerous HTML tags (script/iframe/object/embed)
- sanitize_content() main entry point for text format sanitization
- parse_xml_safe() XXE protection (forbid_dtd/forbid_entities/forbid_external)

Preview API (preview.py)
- PreviewChunk / PreviewResult Pydantic models
- generate_preview() returns read-only segmentation preview

Tests: 112 tests passing (45 new + 67 existing)
- test_sanitize.py: file type/size, markdown sanitization, SSRF, zip/image bomb
- test_document_processor.py: parse/segment, preview, vectorize, failure status
2026-06-25 11:21:42 +08:00
chiguyong c1a21f57a1 feat(rag_platform): U2 — KB persistence + per-KB ACL
Add PostgreSQL-backed KB store replacing in-memory KnowledgeSourceStore:
- models.py: ORM models (KBModel, DocumentModel, KBAclModel) using
  SQLAlchemy 2 DeclarativeBase + Mapped style
- store.py: KBStore with async CRUD for KBs and documents,
  create_kb creates owner ACL in same transaction
- acl.py: filter_kb_by_user_acl(), grant_access(), revoke_access(),
  list_acl() — follows filter_kb_sources_by_department pattern

Schema: rag_platform_kbs, rag_platform_documents, rag_platform_kb_acl
with FK CASCADE on kb_id. UniqueConstraint on (kb_id, user_id).

Tests: 23 unit tests covering KB CRUD, document operations, ACL
filtering, grant/revoke. All 37 rag_platform tests pass.
2026-06-25 11:01:04 +08:00
chiguyong 27d0184392 feat(rag_platform): U1 — RAG platform skeleton + LlamaIndex integration
Create src/agentkit/rag_platform/ module with:
- models.py: Pydantic domain models (KB, Document, Chunk, QueryResult)
- indexing.py: PGVectorStore wrapper with explicit table name
  (rag_platform_kb_chunks) for schema isolation from episodic_memory
- pipeline.py: RAGPipeline wrapping LlamaIndex IngestionPipeline
  (SentenceSplitter + embedding + vector store)

Add dependencies: llama-index-core, llama-index-vector-stores-postgres,
llama-index-embeddings-openai, pgvector, jieba.

Tests: 14 unit tests covering models, indexing (URL conversion, table
name isolation, embed_dim), and pipeline (ingest, query, chunk params).
2026-06-25 10:49:35 +08:00
chiguyong bbbf9cd40a feat(bitable): add bitable companion service with full P0-P2 fixes
Bitable is a multi-dimensional table companion service that runs alongside
the main AgentKit server. It provides structured data storage with formula
fields, views, and ingestion pipelines.

Major components:
- Domain models (Pydantic v2): Table, Field, Record, View, RecalcTask
- SQLAlchemy 2 async ORM with independent bitable PostgreSQL schema
- Formula engine: AST parser, DAG, Kahn topological sort, safe eval
- RecalcWorker: atomic task claiming (FOR UPDATE SKIP LOCKED), topo-order
  processing, stale-threshold reaper for crash recovery
- REST API (/api/v1/bitable): tables, fields, records, views, files
- BitableTool: agent-facing tool with batch chunking (500/batch)
- CLI: agentkit bitable subcommands (create, list, import-excel, etc.)
- Frontend: Vue 3 + vxe-table grid with field management, views, filters
- Ingestion: Excel (openpyxl), database reflection, API collector

Security fixes (ce-code-review P0 + ce-debug P1):
- SQL injection prevention (field_id validation, parameterized queries)
- IDOR protection (_check_table_ownership on all table-level endpoints)
- SSRF prevention (URL scheme + private IP validation in parse_excel_url)
- OOM prevention (streaming file upload, batch delete, batch insert)
- Atomic recalc task claiming (FOR UPDATE SKIP LOCKED)
- Formula engine cache invalidation on field changes
- Composite cursor pagination for non-id sort orders
- Batch upsert (eliminates N+1 queries)
- Sync I/O offloaded to thread pool in async contexts
- Internal token auth (X-Internal-Token, hmac.compare_digest)
- PK unique index enforcement

Test coverage: 88 unit tests (95 skipped without Docker)
2026-06-25 01:09:59 +08:00
chiguyong 567cbc9c9b refactor: simplify code across U1-U7 (bug fix + efficiency + reuse + quality) 2026-06-24 22:35:52 +08:00
chiguyong 0847c0e086 fix(checkpoint): add TTL expiration for memory fallback mode
内存降级模式之前没有 TTL 过期机制,长期运行进程会导致内存泄漏。
现在 list_checkpoints 和 load_plan 在内存模式下会过滤/清除过期数据。

- list_checkpoints: 内存降级分支过滤过期 checkpoint
- load_plan: 内存降级分支检查 TTL 过期,过期则清除并返回 None
- 新增 _is_expired 方法检查 saved_at 是否超过 TTL
- _memory_plans 类型改为 tuple(plan_dict, timestamp) 以支持 TTL
- 新增 5 个 TTL 过期测试覆盖内存模式和 Redis 降级场景
2026-06-24 22:04:55 +08:00
chiguyong fa152e24ac feat(skills): add progressive skill loading with disclosure_level=0 (U5)
When disclosure_level=0, system prompt only injects skill name + description
(summary mode). SkillDetailTool is injected into the tool set, allowing the
LLM to load full instructions on-demand via skill_detail(query). This reduces
context window consumption when many skills are registered.
2026-06-24 21:49:00 +08:00
chiguyong dfd188b1a4 feat(orchestrator): add pipeline checkpoint and crash recovery (U7)
Add PipelineCheckpoint for stage-level crash recovery with Redis-first
+ memory fallback. TeamOrchestrator saves checkpoints after each phase
finalizes and supports resume(plan_id) to continue from the last
completed phase. New POST /api/v1/tasks/{id}/resume endpoint recreates
the team from saved plan and calls resume.
2026-06-24 21:04:18 +08:00
chiguyong 3dfda904d7 feat(core): add middleware pipeline architecture with onion model
U6: Unified middleware protocol (before/after) with MiddlewareChain
implementing onion model execution. Parallel integration (KTD1) —
middleware path controlled by presence of middleware_chain parameter,
existing ReActEngine path unchanged when None.

- New core/middleware.py: RequestContext, Middleware protocol,
  MiddlewareChain (onion model: before outer→inner, after inner→outer)
- 3 example middlewares: SummarizationMiddleware (U3 headroom compression),
  TokenUsageMiddleware, LoopDetectionMiddleware (request-level audit)
- ReActEngine.__init__ accepts middleware_chain parameter
- execute() branches: middleware path when chain present, existing path otherwise
- 22 tests covering ordering, error handling, state passing, backward compat
2026-06-24 20:52:15 +08:00
chiguyong ef84e3fd53 feat(experts): add SharedWorkspace state offloading for long-horizon runs
U4: ExpertTeam accepts redis_client, passes to SharedWorkspace. After phase
completion, full result is written to workspace and in-memory phase.result
is replaced with a 500-char summary + _ref_key. Dependency output reading
resolves offloaded content from workspace on demand, with graceful fallback
to summary on read failure.

Tests: 8 scenarios (offload creation, short content, dependency resolution,
workspace failure fallback, non-offloaded passthrough, redis_client wiring,
memory dict fallback, pipeline integration) — all pass.
2026-06-24 20:32:10 +08:00
chiguyong 122173ec2c feat(core): add headroom-based compression trigger
U3: ContextCompressor now accepts model_context_limit, headroom_threshold,
and min_tokens. should_compress() triggers when token ratio exceeds 0.8 of
model limit OR exceeds min_tokens (8000 fallback). ReActEngine._should_compress
delegates to compressor when available, checks is_available() first.

Tests: 6 scenarios (headroom trigger, min_tokens guard, small model,
unavailable compressor, delegation, fallback) — all pass.
2026-06-24 20:28:14 +08:00
chiguyong 717aad1303 feat(experts): add concurrency limit to TeamOrchestrator parallel phases
U2: Add asyncio.Semaphore to bound concurrent phase execution and debate
argument generation. Default limit=3, configurable via max_concurrent_phases.
Prevents LLM rate-limit spikes when many phases run in the same layer.

Tests: 5 scenarios (happy path, 5-phase edge case, serial mode, failure
release, debate integration) — all pass.
2026-06-24 20:23:30 +08:00
chiguyong 018b342d96 feat(react): add loop detection to prevent repeated identical tool calls
U1: Sliding window hash detection in ReAct loop. When the same tool is
called with identical arguments >= threshold times (default 2), injects
a correction message first, then raises LoopDetectedError if the LLM
doesn't change strategy. Covers both _execute_loop and execute_stream.
2026-06-24 20:12:35 +08:00
chiguyong a312e584ae Merge branch 'feat/expert-team-pm-collaboration' — PM 协同模式 + 代码审查全量修复
Deploy to Production / deploy (push) Waiting to run Details
# Conflicts:
#	src/agentkit/server/frontend/components.d.ts
2026-06-24 18:57:37 +08:00
chiguyong 20a4c55d5b feat(skills): SkillHarness 前置条件 + 风险守卫学习增强
- cli/skill.py: skill learn 子命令增强
- evolution/risk_guard_learner.py: 风险守卫学习改进
- memory/models.py: 记忆模型扩展
- skills/base.py + loader.py: SkillHarness 前置条件支持
- 对应测试更新
2026-06-24 18:56:51 +08:00
chiguyong 574db8458f fix(experts): PM 协同代码审查全量修复
P0: 跨阶段契约状态同步 — _notify_collaborators 更新接收方契约状态为 received
P0: 4 个 PM 事件加入 _VALID_TEAM_EVENT_TYPES 白名单

P1: 验收 fail-open 改标注降级原因
P1: 返工失败抛 RuntimeError 而非返回 dict
P1: 验收 prompt injection 防护 — 专家输出用 XML 标签包裹
P1: 契约字段校验 _EXPERT_NAME_RE
P1: bool("false") 修复 — 显式比较避免字符串真值陷阱
P1: _parse_risk_flags(None) 防御

P2: _notify_collaborators 移到验收通过后
P2: SharedWorkspace 写入移到验收通过后
P2: 验收贪婪正则修复
P2: 风险标记数量上限 MAX_RISK_FLAGS=10
P2: 返工 feedback 截断
P2: 前端会话隔离 — 切换会话时清除/恢复 collaborationState
P2: 前端契约状态更新 — collaboration_notice 时标记 delivered
P2: CLI 死代码标注 + 异常改 debug 日志
P2: 模块级 _RISK_FLAG_RE 预编译
2026-06-24 18:56:27 +08:00
chiguyong 6016c087fe feat(cli): U6 CLI 协同事件 Rich 渲染
- chat.py 新增 _render_collaboration_contracts 和 _render_pm_collaboration_event
- 4 种 PM 协同事件渲染:
  collaboration_contract_defined (cyan Panel)
  collaboration_notice (蓝→品红 文本)
  review_result (passed=green / failed=red Panel)
  risk_flagged (yellow Panel)
- plan_update 中提取 collaboration_contracts 并渲染
- _print_help 更新项目经理模式说明
- 优雅降级:字段缺失回退到 ?,空契约不输出,整体 try/except 不中断编排
- 新增 11 个测试(TestPMCollaborationRendering 9 + TestPrintHelpPMMode 2)
- ruff 通过,pytest 23 passed
2026-06-24 14:57:49 +08:00
chiguyong 5487cca199 feat(experts): U4 专家风险标记 + risk_flagged 事件
- orchestrator 新增 _parse_risk_flags 静态方法,正则解析 [RISK: ...] 标记
- _execute_execution_phase 在协作通知后、验收前解析风险标记
- 风险标记通过 risk_flagged 事件广播,供前端/CLI 渲染
- 无风险标记时行为不变,向后兼容
- 新增 TestRiskFlagging 7 个测试(单/多/无/格式错误/事件发出/内容/兼容)
2026-06-24 14:17:58 +08:00
chiguyong 62fcbc0feb feat(experts): U3 Lead 验收环节 + 返工机制
- PlanPhase 添加 rework_count 和 review_feedback 字段
- 添加 _review_phase_output 方法,Lead 用 LLM 验收阶段输出
- _execute_execution_phase 重构为返工循环(MAX_REWORKS=2)
- 验收通过/返工/失败三种路径,发出 review_result 事件
- LLM 不可用时优雅降级直接通过
- 6 个新测试,全套 449 passed 无回归
2026-06-24 14:09:18 +08:00
chiguyong fef7ecea39 feat(skills): SkillHarness 激活前置条件 + 风险守卫学习
基于 SkillHarness 论文(arXiv:2606.20636)与 Agent Skills 综述
(arXiv:2602.12430)引入激活前置条件(preconditions)与来源标记
(provenance),并新增从失败轨迹学习风险守卫建议的能力。

变更内容:
- U1: SkillConfig 新增 v7 preconditions/provenance 字段(base.py)
- U2: build_skill_system_prompt 注入 preconditions 软检查段落
- U3: SkillLoader 三路径记录 provenance + entry_points 危险能力告警
- U4: 10 个业务 Skill YAML 补充 preconditions(2-4 条中文短句)
- U5: RiskGuardLearner 从失败轨迹学习风险守卫建议(人工审查,不自动应用)
- U6: CLI 命令 agentkit skill learn-risk-guards

关键决策:
- KTD1: preconditions 通过 system_prompt 注入(软检查),不做硬 LLM 调用
- KTD2: RiskGuardLearner 不自动应用,需人工审查(论文显示 75% 自动学习不安全)
- KTD3: provenance 为轻量字符串,不加 hash/签名(无合规需求)

测试:39 个新增单元测试全部通过,ruff 检查通过。
2026-06-24 13:56:37 +08:00
chiguyong c46cf06f6d feat(experts): U2 协作契约执行 — 专家可见 + 主动通知
- _execute_execution_phase 按协作契约读取相关专家输出(可见性)
- 添加 _notify_collaborators 方法,完成后通知相关专家(可协助)
- 发出 collaboration_notice 事件,契约状态更新为 delivered
- 7 个新测试,全套 443 passed 无回归
2026-06-24 13:54:38 +08:00
chiguyong f219c5f016 feat(experts): U1 协作契约数据模型 + Lead 生成契约
- PlanPhase 添加 collaboration_contracts 字段(CollaborationContract dataclass)
- 修改 _decompose_task prompt,要求 Lead 分解任务时定义协作契约
- 修改 _parse_phases 解析 LLM 返回的协作契约信息
- plan_update 事件自动包含协作契约(通过 to_dict 序列化)
- 71 + 9 = 80 个新测试,全套 436 passed 无回归
2026-06-24 13:44:50 +08:00
chiguyong b86100a0a1 feat(cli): U6 CLI 多 Agent 入口 + 辩论 Rich 渲染
- 新增 _execute_team_cli() 处理 @team 前缀,运行 ExpertTeam 流水线
- Rich 事件渲染:team_formed/plan_update/phase_*/debate_*/team_synthesis
- 干预循环使用 select.select() 非阻塞轮询 stdin(Unix-only,ponytail 标注)
- 支持 /debate 手动触发辩论、/stop 终止团队、纯文本作为上下文注入
- 扩展 _print_help() 增加 Multi-Agent 与 Interventions 说明
- 新增 12 个单元测试覆盖路由、帮助文档、函数返回值、干预基础设施
2026-06-24 13:03:57 +08:00
chiguyong c831e925b6 feat(experts): U4 用户干预通道 + 手动辩论触发
建立 @team 执行期间的用户干预通道,支持 /stop、/debate <topic>、
普通文本追加上下文。

ExpertTeam (src/agentkit/experts/team.py):
- 新增 _interventions: asyncio.Queue (maxsize=64) 干预队列
- add_user_intervention(msg): 广播 + 入队
- consume_user_interventions(): 排空并返回待处理干预
- broadcast_user_message 现在同时入队干预队列

TeamOrchestrator (src/agentkit/experts/orchestrator.py):
- 新增 _user_context: list[str] 累积普通文本干预
- 新增 _process_interventions(lead, plan) 在每层执行前调用:
  * /stop → 终止执行,广播 plan_update(stopped_by_user)
  * /debate <topic> → 动态插入 DEBATE phase(受 MAX_DEBATES 限制)
  * 普通文本 → 累积到 _user_context
- _synthesize_results 将 _user_context 追加到 synthesis prompt

WS 路由 (src/agentkit/server/routes/chat.py):
- 模块级 _active_teams dict 跟踪每个 session 的活跃团队
- _execute_team_collab 执行前注册、finally 注销
- WS 消息循环:若 session 有活跃团队,message 路由为干预而非新任务
- 新增 team_intervention_ack 确认消息

测试:tests/unit/experts/test_team_intervention.py(20 测试),
覆盖队列基础、/stop、/debate、普通文本、混合消息、synthesis 影响。
同步更新 test_orchestrator_debate.py 的干预通道兼容性测试
(U4 已实现 consume_user_interventions)。

全部 418 experts 测试 + 325 server 测试通过。
2026-06-24 12:17:09 +08:00
chiguyong 5b5bd44ac4 test(calendar): 7 integration flow tests (lifecycle, recurrence, tags, types, invitations, authz, ICS) 2026-06-24 12:04:42 +08:00
chiguyong 3fdee65979 fix(calendar): code review fixes - 23 issues (2 critical, 15 major, 6 minor) 2026-06-24 11:29:23 +08:00
chiguyong ac26d417b3 feat(experts): U3 分歧检测 + 方案评审辩论自动触发
在 TeamOrchestrator 中新增 4 个方法实现自动辩论触发:

- _maybe_add_plan_review_debate: 任务分解后可选插入方案评审 DEBATE
  phase(phases > 2 且 LLM 判断需要时),所有执行阶段依赖它
- _detect_divergence: 每层执行后用 LLM 判断已完成阶段产出是否与其他
  阶段存在分歧,偏好 false negative
- _insert_debate_phase: 动态插入 DEBATE phase 并重 wiring 依赖
  (原依赖 trigger 的 phase 现在依赖 DEBATE)
- _check_divergence_and_insert_debates: 每层完成后的协调入口,
  受 MAX_DEBATES=3 上限保护

主循环从 `for layer in layers` 改为 `while True` + 重新计算
topological_sort(),以支持动态插入 DEBATE phase 后的依赖分层。

测试:tests/unit/experts/test_divergence_detection.py(21 测试),
覆盖 happy path / 边界 / 错误路径 / 集成分层。同步修复
test_team_orchestrator.py 的 mock gateway 以适配 U3 的额外 LLM 调用。

全部 398 测试通过。
2026-06-24 11:09:53 +08:00
chiguyong fbe08cb1e2 feat(experts): add debate phase executor to TeamOrchestrator (U2)
Implement _execute_debate_phase() with Lead-facilitated structured debate:
- Lead opens with divergence point + dependency context
- Experts argue in parallel per round (asyncio.gather)
- Lead summarizes each round, then adjudicates final verdict
- Verdict produces decision (adopt/compromise/shelve/inconclusive) + conclusion
- Conclusion written to SharedWorkspace for downstream phases

Escape hatches:
- debate_config.skip=true short-circuits with template text
- MAX_DEBATE_ROUNDS=4 hard cap on rounds
- User /stop intervention ends debate early (U4-compatible via getattr fallback)
- LLM unavailable falls back to template verdict, no crash

New events: debate_started, expert_argument, debate_round_summary,
debate_resolved (plus existing phase_completed for consistency).

Phase dispatcher (_execute_phase) routes by phase_type:
EXECUTION to _execute_execution_phase, DEBATE to _execute_debate_phase.

36 new tests in test_orchestrator_debate.py covering happy path (2 rounds,
2 experts), max_rounds=1 boundary, empty participants, user stop, skip
escape hatch, LLM unavailable, SharedWorkspace integration, event
broadcasting, intervention channel compatibility, and helper methods.
All 377 expert tests pass.

Also includes planning artifacts (brainstorm requirements + implementation
plan with 6 units U1-U6).
2026-06-24 10:54:51 +08:00
chiguyong e539122314 feat(experts): add PhaseType enum and debate_config to PlanPhase
U1: Data model foundation for structured debate collaboration.
- Add PhaseType enum (EXECUTION | DEBATE)
- Add phase_type and debate_config fields to PlanPhase
- Update to_dict/from_dict for serialization with backward compatibility
- Add tests for PhaseType, debate phase creation, serialization, and
  mixed EXECUTION+DEBATE topological sort
2026-06-24 10:42:11 +08:00
chiguyong 8d4145ddf9 feat(calendar): U7 Outlook sync via Microsoft Graph API
OutlookSyncProvider implementing AbstractSyncProvider for
bidirectional Outlook Calendar sync. Uses Graph API delta query
for incremental pull, auto-refreshes OAuth tokens on 401, and
converts Outlook recurrence patterns to RRULE. Same conflict
resolution as CalDAV (last-write-wins + WS notification).

- src/agentkit/calendar/sync/outlook_provider.py — OutlookSyncProvider
- tests/unit/calendar/test_sync_outlook.py — 8 tests
2026-06-23 23:49:24 +08:00
chiguyong 40d326cd3f feat(calendar): U6 CalDAV sync provider and SyncManager
AbstractSyncProvider interface with CalDAVSyncProvider implementation
for bidirectional Apple Calendar sync. SyncManager orchestrates all
providers (G8) — sync_all/sync_provider/resolve_conflict with
last-write-wins + WS notification on conflicts (G4). caldav library
calls wrapped in asyncio.to_thread for non-blocking operation.

- src/agentkit/calendar/sync/base.py — AbstractSyncProvider ABC
- src/agentkit/calendar/sync/caldav_provider.py — CalDAVSyncProvider
- src/agentkit/calendar/sync/manager.py — SyncManager (G8)
- pyproject.toml — added caldav>=1.3 dependency
- tests — 12 tests (9 CalDAV + 3 SyncManager)
2026-06-23 22:52:29 +08:00
chiguyong ffb184acc7 feat(calendar): U8 iCal/ICS import and export
ICSProvider parses .ics files (icalendar library) and creates local
CalendarEvents, skipping duplicate UIDs. Export builds an iCalendar
from events in a date range, deduplicating recurring event
occurrences back to a single VEVENT with RRULE. REST endpoints:
POST /import-ics (multipart upload), GET /export-ics (download).

- src/agentkit/calendar/sync/__init__.py — sync subpackage init
- src/agentkit/calendar/sync/ics_provider.py — ICSProvider (import/export)
- src/agentkit/calendar/db.py — added get_event_by_external_id() for dedup
- src/agentkit/server/routes/calendar.py — import-ics and export-ics endpoints
- pyproject.toml — added icalendar>=5.0 dependency
- tests/unit/calendar/test_ics_provider.py — 8 tests
2026-06-23 22:20:07 +08:00
chiguyong 26efbb51db feat(calendar): U5 reminder subsystem with scheduler and multi-channel dispatch
ReminderScheduler scans upcoming events every 60s, matches reminder
rules, and dispatches via client (WS), email (SMTP), or webhook
channels. Idempotent delivery (no duplicates on re-scan), retry with
exponential backoff (up to 3 attempts). Follows task_store.py
start/stop asyncio loop pattern (KTD-2 — conscious deviation from
APScheduler).

- src/agentkit/calendar/scheduler.py — ReminderScheduler (start/stop/scan_once)
- src/agentkit/calendar/reminders.py — ReminderDispatcher (strategy per channel)
- src/agentkit/calendar/db.py — added list_all_events_in_time_range() for scheduler
- tests/unit/calendar/test_scheduler.py — 8 tests
- tests/unit/calendar/test_reminders.py — 9 tests
2026-06-23 22:19:57 +08:00
chiguyong ddcedb57b2 feat(calendar): U4 post-processing extractor with keyword gating
Adds PostProcessingExtractor — a zero-LLM keyword gate (Chinese +
English time words) followed by LLM extraction for ambiguous cases.
Events created from extraction carry source="post_extract" so the UI
can style them distinctly (R33). LLM gateway is optional to keep the
constructor testable without a live provider.

- src/agentkit/calendar/extraction.py — PostProcessingExtractor
- tests/unit/calendar/test_extraction.py — 13 tests with MockLLMGateway
2026-06-23 21:56:20 +08:00
chiguyong 42fe7bcbc9 feat(calendar): U3 agent calendar tool for ReAct integration
Adds CalendarTool implementing the Tool ABC so the ReAct engine can
create, query, update, and delete events autonomously. Resolves
event_type_name and tag_names (look up or create), sets
source="agent" to distinguish agent-created events from manual ones.

- src/agentkit/tools/calendar_tool.py — CalendarTool(Tool)
- tests/unit/tools/test_calendar_tool.py — 13 tests covering all actions
2026-06-23 21:56:08 +08:00
chiguyong d36e45bbe7 feat(calendar): U2 backend service & REST API
Add CalendarService business logic layer and 14 REST endpoints:
- service.py: event CRUD with RRULE expansion, event types, tags,
  invitations, non-admin user search (G5/A3), type-level default
  reminder rule cloning
- routes/calendar.py: JWT-authenticated endpoints for events, types,
  tags, invitations, user search — with ownership checks
- 17 new tests (12 service + 5 routes), 33 total calendar tests passing
2026-06-23 21:43:39 +08:00
chiguyong 2ea799f6c4 feat(calendar): U1 backend data model, storage & RRULE expansion
Add calendar subsystem foundation mirroring documents/ pattern:
- models.py: 8 dataclasses (CalendarEvent with is_invited, EventType,
  Tag, EventTag, ReminderRule, ReminderDelivery, ExternalCalendarConfig,
  Invitation)
- db.py: aiosqlite bare-connection CRUD for all 8 tables with WAL mode
- recurrence.py: RRULE expansion via dateutil.rrule (RFC 5545)
- 16 unit tests covering DB CRUD and RRULE edge cases (DST, UNTIL, range)
- Add python-dateutil>=2.9 to pyproject.toml
2026-06-23 21:30:39 +08:00
chiguyong 3337589395 fix(review): document-processing code review fixes — validation, tests, formatting
Deploy to Production / deploy (push) Waiting to run Details
- SkillConfig._validate_v2: validate fallback_strategies against
  ReWOOEngine.VALID_STRATEGIES (lazy import, #20)
- test_skill_config: +4 tests for fallback_strategies validation
- test_document_loader: +8 xlsx edge case tests (empty workbook,
  malformed bytes, column mismatch, row/cell truncation, multi-sheet,
  file size limit, None cells, #16)
- test_execution_modes: fix ReWOOEngine patch path (lazy import ->
  patch at source) + FakeReWOOEngine.execute return .output attribute
- config_driven: ruff formatting (quotes, blank lines after imports)
- project_rules: remove stale "known failing test" note (now passes)
2026-06-23 20:21:19 +08:00
chiguyong a672dddc9a feat(skills): distinguish agent templates from business skills in UI
Deploy to Production / deploy (push) Waiting to run Details
The skills tab mixed generic execution-engine templates (react/direct/
rewoo/...) with business-domain skills (monitor/geo_optimizer/...) with
no visual or data distinction. Adds a derived `category` field to the
SkillInfo/SkillDetail API models and groups the frontend display.

Backend:
- SkillInfo/SkillDetail: add category (Literal), agent_type, execution_mode,
  task_mode fields
- _skill_to_info: derive category from explicit _ENGINE_TEMPLATE_NAMES set
  (not name suffix — trend_agent/deai_agent are business skills despite
  the _agent suffix)
- Simplify repetitive hasattr pattern with getattr

Frontend:
- ISkillInfo/ISkillDetail: add category + mode fields
- skills store: agentTemplates/businessSkills computed getters
  (businessSkills is defensive: anything not explicitly engine template)
- SkillsView: group into 执行引擎 / 业务技能 sections with counts
- SkillCard: type badge (引擎/技能), category-based icon, mode display,
  dark-mode-aware accent color

Tests:
- test_category_derived_from_name_suffix: verifies field exposure
- test_category_no_orphans: invariant — every skill has a valid category
- test_trend_agent_classified_as_business_skill: regression guard for
  the _agent suffix misclassification bug

Code review (ce-code-review): 2 P1 + 5 P2 findings applied.
2026-06-23 15:55:59 +08:00
chiguyong 47f3bfecfc feat(documents): add document processing capability (U1-U9)
Implements end-to-end document generation, template filling, and reading:

- DocumentService: unified business layer for create/query/download
- Renderers: Word (Markdown->docx), Excel (Markdown/JSON->xlsx),
  PDF (Markdown->pdf with CJK font), Template (Jinja2 sandbox .docx fill)
- DocumentLoader: read PDF/Word/Excel/Markdown/HTML/text -> Document
- DocumentTool: Agent tool with action=create|read
- REST API: /api/v1/documents (create, upload-template, list, download)
- Frontend: DocumentPanel, DocumentCard, documents Pinia store,
  chat store tool_result detection
- Security: path traversal guard (Path.resolve + relative_to),
  SSTI guard (SandboxedEnvironment), API key auth, 50MB upload limit
- Bug fixes: template path traversal (400 not 500), TemplateRenderer
  lazy-load (no external registration dependency)
- Tests: 168 tests (unit + security + E2E F1/F2/F3 + bug hunt)
- Docs: README section 17, requirements + plan + test-plan docs

Requirements R1-R28 verified, F1-F3 user flows pass.
2026-06-23 15:05:01 +08:00
chiguyong 4f261523c2 fix(review): U3 atomic file writes for YAML + .env + skill config
All config file writes now use the write-temp + fsync + os.replace
pattern (KTD-4) so a crash mid-write leaves the original file intact.

- Add src/agentkit/server/utils/atomic_write.py with write_text_atomic
- settings.py: _write_yaml_config and _write_env_var use atomic write
- skill_service.py: import_skill uses atomic write
- skill_service.py: update_skill_config uses atomic write + fcntl.flock
  around the read-modify-write cycle to serialize concurrent updates
- Add 11 unit tests covering happy path, crash safety, concurrency, errors
2026-06-22 17:03:27 +08:00
chiguyong 698a8fafba fix(review): U7 refresh token hash verification on whoami
The whoami route accepted rotated/old refresh tokens for cold-start
because it only checked session revocation status, not the token hash.
Now when token_type == "refresh", the route computes hash_token(token)
and compares it with the session's stored refresh_token_hash using
hmac.compare_digest (constant-time). Mismatch returns 401.

- Add SessionService.get_stored_refresh_hash(session_id) helper
- Add hash verification in whoami route (R9)
- Add TestWhoamiTokenHash with 5 integration tests
2026-06-22 16:55:20 +08:00
chiguyong 278d76b381 fix(review): U6 frontend field alignment + CLI top-users field fix 2026-06-22 16:28:44 +08:00
chiguyong 00c8386939 fix(review): U1 Redis quota enforcement — key construction + fail-closed + degradation recovery + async 2026-06-22 16:22:33 +08:00
chiguyong abe2a66436 fix(review): CLI field names, Pydantic validation, exception chaining 2026-06-22 15:24:31 +08:00
chiguyong 5e977539c7 test(admin): U10 — E2E + security isolation + quota enforcement tests
23 integration tests across 3 files:
- test_e2e_admin_flow: 5 end-to-end lifecycle tests (department, user,
  LLM config, skill management, usage dashboard)
- test_security_isolation: 7 department isolation tests + non-admin 403
  tests (cross-dept skill/KB access, multi-dept union, admin sees all,
  removed user loses access, disabled dept, API key client)
- test_quota_enforcement: 10 quota tests (token/cost/whitelist limits,
  multi-dept strictest-wins, real gateway integration, usage recording)

418 admin tests pass, no regressions.
2026-06-21 19:57:49 +08:00
chiguyong 2dd0091bda feat(admin): U8 — CLI admin command group
AdminHttpClient: sync HTTP client with JWT/API key auth, config file
support (~/.agentkit/admin_config.yaml), env var fallback.

35+ CLI commands across 7 groups: login, department (CRUD + bind/unbind
skill/KB + quotas), user (CRUD + reset-password + assign/remove dept),
llm (providers + api-key + fallbacks), skill (list/enable/disable/
import/reload), kb (documents CRUD + sync/rebuild), usage (summary/
timeseries/by-model/top-users/export).

All commands support --server-url, --token, --api-key, --json flags.
Rich table output by default, raw JSON with --json. Friendly error
handling for connection/auth/not-found/conflict errors.

64 new tests, 102 CLI tests pass, no regressions.
2026-06-21 18:56:14 +08:00
chiguyong 09feca3307 feat(admin): U7 — usage dashboard + quota enforcement
UsageRecord extended with user_id + department_id (backward compatible).
UsageStore Protocol extended: record() accepts user_id/department_id,
get_usage() accepts filters, new get_usage_by_user/department methods.
RedisUsageStore uses versioned keys (v2) for new records.

LLMGateway.chat()/chat_stream() accept user_id, department_ids, db_path.
Quota check before provider call: model whitelist + token limit + cost
limit (daily). Multi-department uses strictest-wins (any exceed → reject).
QuotaExceededError → 429 at route layer.

UsageService: summary, timeseries, by-model, top-users, export (CSV/JSON).
5 new admin endpoints under /admin/usage/*.

llm_gateway.py routes pass DepartmentContext + db_path to gateway,
catch QuotaExceededError → 429 (JSON for /chat, SSE error for /stream).

84 new tests. 441 admin+usage tests pass, no regressions.
2026-06-21 17:23:20 +08:00
chiguyong fd7f6816b8 feat(admin): U6 — Skill & KB management endpoints + department binding
SkillService: enable/disable (persisted in skill_states table, schema
v4), import from YAML (with path traversal + name validation), reload
from file, update config. GET /skills now filters disabled skills.

KbService: list/upload/delete documents with department_id binding.
Added department_id field to KnowledgeSource + UploadedDocument.
Department visibility: (bound to user depts) ∪ (global = None).

10 new admin endpoints: skill enable/disable/import/reload/update,
KB documents CRUD, source sync/rebuild. All guarded by _require_admin.

Implemented reload stub in skill_management.py (was no-op).

54 new tests (26 unit + 28 integration). Fixed 4 pre-existing lint
errors. 357 admin tests pass, no regressions.
2026-06-21 16:19:51 +08:00
chiguyong 980919fc95 feat(admin): U5 — LLM config admin endpoints + department quotas
QuotaService: set/get/list/delete quotas, check_quota (hard reject),
is_model_allowed. JSON-serialized limit_value, upsert with ON CONFLICT.

LlmConfigService: provider CRUD + set_api_key + fallback management.
fcntl.flock file lock prevents concurrent YAML writes. Reuses
settings.py helpers (_read_yaml_config, _write_yaml_config,
_write_env_var, _mask_api_key).

11 new admin endpoints: provider CRUD, api-key, fallback CRUD,
department quotas CRUD. All guarded by _require_admin.

93 new tests (30 quota unit + 32 llm-config unit + 31 integration).
2026-06-21 15:03:38 +08:00
chiguyong ad65f7a8d7 feat(admin): U1+U2+U4 — schema v3, department service, context filtering
U1: Bump _SCHEMA_VERSION to 3, add 5 department tables (departments,
user_departments, department_skill_bindings, department_kb_bindings,
department_quotas) + 5 ORM models + helpers.

U2: DepartmentService (12 async methods: CRUD + bind/unbind skill/KB +
count_users). Mount admin_router in app.py. 36 unit + 28 integration tests.

U4: DepartmentContext FastAPI dependency (per-route, admin bypasses
filtering). filter_skills_by_department / filter_kb_sources_by_department
helpers. Applied to GET /skills and GET /kb-management/* routes.
15 integration tests for department isolation.

Also includes brainstorm + plan docs. 108 new tests, all pass.
2026-06-21 15:03:27 +08:00
chiguyong 6dca9ba4f2 feat(admin): U3 — user CRUD + password reset + multi-department
Add create_user method to LocalAuthProvider (bcrypt hash + INSERT,
raises ValueError on duplicate username/email).

Add UserService with 9 async methods: create/list/get/update/delete
(soft)/reset_password/assign_department/remove_department/list_user_departments. reset_password revokes all sessions via SessionService.
delete_user is soft (is_active=0, row preserved).

Add 9 user endpoints to routes/admin.py: POST/GET/PATCH/DELETE users,
reset-password, assign/remove department, list departments. All
guarded by _require_admin.

Tests: 40 unit + 37 integration = 77 new tests. Full admin suite
170 tests pass, no regressions.
2026-06-21 13:45:42 +08:00
chiguyong 67c0d67262 fix(auth,chat): P0 security fixes + stop-generation button + doc sync
U1: whoami cold-start security — add is_active check (disabled users
now get 401, not 200) and replace create_token_pair with create_access_token
to avoid minting a discarded refresh token (token-amplification risk).

U2: list_active_by_provider now filters expired sessions (expires_at > now)
matching its docstring promise; previously only checked revoked = 0.

U3: Fix asyncio.run() crash in test_revoke_other_user_session_returns_404
(converted to async). Add U1/U2 verification tests (disabled-user whoami,
no-refresh-leak, expired-session filtering, provider filtering) and
strengthen admin route tests (404 boundary, non-admin 403 on /admin/sessions).

U4: Update CLAUDE.md/AGENTS.md Request Flow — CostAwareRouter 3-layer
diagram replaced with actual RequestPreprocessor architecture (@board/@team
prefix intercepts then @skill: prefix then trivial-input regex then default
REACT). ExecutionMode list expanded to all 7 values.

U5: Frontend stop-generation button — ChatInput.vue shows a stop button
when isGenerating is true; chat store gains stopGeneration() that sends
{type:"cancel"} over WebSocket (backend portal.py already handles cancel).

Tests: 120 auth tests pass (unit + integration). ruff clean. vue-tsc clean.
2026-06-21 11:36:58 +08:00
chiguyong aee7362665 feat(auth): U3/U4/U9 logout-others + whoami cold-start + admin UI + integration tests 2026-06-21 09:08:34 +08:00
chiguyong 9328451050 feat(auth): U7-U10 会话管理 UI + admin API + 测试修复
- U7: 前端 ActiveSessionsPanel + ChangePasswordPanel 组件
- U8: 用户会话管理(查看/撤销/改密)集成到 SettingsView
- U9: 管理员会话管理 API + UserSessionsPanel + AdminApiClient
- U10: 认证中间件支持 sid 会话验证 + legacy client 兼容
- 修复 test_auth.py 测试夹具:注入 SessionService 单例绑定测试 DB
- 修复 wrong-password 断言大小写匹配
- ruff: 清理未使用导入
2026-06-21 08:48:25 +08:00
chiguyong b418c3dc95 feat(auth): U3 SessionService + validation cache
Adds the central business-logic layer for ``auth_sessions`` so routes,
the auth middleware, and the admin endpoints can call a single service
instead of touching the table directly.

Server
- session_service.SessionService: CRUD + lifecycle for auth_sessions.
  - create() enforces the per-user cap (default 10): the oldest
    active session is evicted with reason=session_cap_eviction.
  - rotate() swaps a refresh token, adds the old hash to the
    denylist, and raises SessionReuseDetected (revoking all sessions
    for the user) when the old token is replayed.
  - revoke() / revoke_by_refresh_token() / revoke_all_for_user()
    with explicit reasons: user_terminated, admin_revoked,
    password_changed, reuse_detected, session_cap_eviction.
  - touch() bumps last_active_at (called on /auth/whoami).
- session_cache.SessionValidationCache: bounded LRU+TTL wrapper
  (default 30s/1k entries) around SessionService.is_session_valid.
  The middleware hits this on every request carrying a V2 sid claim;
  one SQLite round-trip per 30s per session instead of per request.
- get_session_service() / get_validation_cache() module-level
  singletons overridable in tests via set_session_service() /
  set_validation_cache().

Tests
- tests/unit/auth/test_session_service.py: 15 cases covering
  create/rotate/revoke/list/cap-eviction/reuse-detection/expired
  sessions.

Refs: U3 in docs/plans/2026-06-20-002-feat-centralized-auth-token-persistence-plan.md
2026-06-21 01:58:30 +08:00
chiguyong 5ba1aceb96 feat(auth): U2 JWT sid/jti claims + refresh-token denylist
Adds V2 JWT claim schema that closes the kicked-out window and enables
refresh-token rotation with reuse detection.

Server
- jwt_utils.create_token_pair now takes ``session_id`` and ``remember_me``
  kwargs.  When ``session_id`` is provided, both tokens carry a ``sid``
  claim and the access token also carries a ``jti`` claim; the refresh
  token's jti is intentionally absent (rotation uses the token hash).
- New ``REFRESH_TOKEN_TTL_REMEMBER_ME = 30d`` (default 7d) selected by
  the ``remember_me`` flag.
- ``verify_token`` now supports an optional ``expected_type`` filter
  (e.g. ``"access"`` / ``"refresh"``); when omitted, both types pass
  (used by /auth/whoami's cold-start path).
- New ``auth.denylist`` module: ``InMemoryRecentlyRevoked`` (default for
  the Tauri sidecar / dev mode) and ``RedisRecentlyRevoked`` (multi-
  process server).  Bounded LRU with auto-expiry via ``time.monotonic()``.

Backwards-compat
- Tokens issued before U2 (no ``sid``) are still accepted by
  ``verify_token``; validation falls through to the legacy
  ``user_sessions`` table via the U10 shim (next commit).

Tests
- tests/unit/auth/test_jwt_utils.py: 12 cases — V1/V2 claim presence,
  default + remember-me TTL, expected_type filter, expiry, wrong secret.
- tests/unit/auth/test_denylist.py: 6 cases — add/contains, TTL expiry,
  LRU eviction, re-add refresh, clear, hash stability.

Refs: U2 in docs/plans/2026-06-20-002-feat-centralized-auth-token-persistence-plan.md
2026-06-21 01:53:13 +08:00
chiguyong 2f55fc7434 feat(auth): U11 AuthProvider 抽象层 + auth_sessions schema
为未来对接集团 IdP(OIDC / SAML / LDAP / 飞书 / 钉钉 / 企微)留扩展点,
同时落地 auth_sessions 表(V2 替代 user_sessions)。

变更
- models.py: 新增 auth_sessions + auth_meta 表,V1→V2 数据回填
- providers/base.py: AuthProvider Protocol 接口契约
- providers/local.py: LocalAuthProvider 默认实现(封装 SQLite + bcrypt)
- providers/oidc_stub.py: StubOIDCProvider 占位(NotImplementedError)
- providers/__init__.py: get_auth_provider DI 工厂(lru_cache 单例)
- providers/exceptions.py: AuthProviderError / InvalidCredentials / ProviderNotImplemented
- providers/user.py: Provider-agnostic User 值对象
- tests/unit/auth/: 37 个测试覆盖 Protocol / DI / Local / OIDC 行为

auth_sessions.auth_provider 字段记录登录来源(local / oidc-stub / 未来
oidc-keycloak / saml / ldap),未来切 IdP 时审计可溯源。

测试: 37 passed (providers) + 62 passed (auth 全集) + ruff check clean
2026-06-21 01:28:14 +08:00
chiguyong cac9c73dd5 fix(routing): U1-U6 路由优化 + 修复方案 + 代码审查修复
实现 6 个修复单元(U1-U6)并应用 ce-code-review 发现的 5 项安全修复。

## U1: benchmark 超时阈值
- 按 difficulty 分级超时:easy=45s, medium=60s, hard=90s
- 替换原单一 60s 硬编码

## U2: OpenAICompatibleProvider httpx 超时
- 新增 timeout 参数(默认 120s),替换硬编码 60s
- ProviderConfig.timeout 透传到 Provider
- 新增 2 项单元测试

## U3: 激活 QualityGate skill_match 校验
- BaseAgent._build_skill_context() 构造 skill_context
- 在 base.py / tasks.py / runner.py 三处传入 QualityGate.validate()

## U4: 添加 disambiguation_keywords 字段
- IntentConfig 新增 disambiguation_keywords 字段
- 8 个 skill YAML 补充该字段

## U5: 优化 RequestPreprocessor 路由正则
- 拆分 _FACTUAL_RE 为 CN/EN 双正则(中文无空格)
- 新增 _MATH_RE / _TRANSLATION_RE 纯模式
- _TOOL_CONTEXT_RE 排除需要工具的实时查询
- 多行输入守卫 + 结尾标点支持
- 新增 21 项单元测试(共 40 项全通过)

## U6: 重新基准测试
- 真实 LLM benchmark:准确率 60% -> 93.3%
- 4/5 通过,p50=40.8s,一致性=100%
- 旧基线备份至 baseline_2026-06-17_old_arch.json

## ce-code-review 修复(5 项)
- 修复 \s 字符类匹配换行符的安全隐患
- 添加事实/数学正则的结尾标点支持
- 修复 geo_optimizer.yaml 关键词重复
- 修复 _login_with_retry 不可达 return
- 修复 real_llm_server fixture stderr_fh 资源泄漏

测试:tests/unit/chat/ 63 项全通过,ruff 检查通过。
2026-06-20 19:31:49 +08:00
chiguyong 2e404cf1a0 test: 全面回测 + 真实 LLM E2E + 能力 benchmark + 问题修复
## 测试结果

### 后端 E2E(真实 LLM,真实服务器)— 13/13 通过
- tests/e2e/test_real_llm_e2e.py: 认证流程、LLM 网关、Chat API、WebSocket
- 使用百炼 coding plan(qwen3.7-plus)真实 LLM,无 mock
- 修复 SQLite 写锁竞争导致的间歇性 500(_login_with_retry 重试机制)

### 前端 E2E(Playwright + 真实 LLM)— 11/11 通过
- login.spec.ts (4): 登录流程、表单验证、token 存储
- chat.spec.ts (3): 真实 LLM 对话、消息渲染
- terminal.spec.ts (4): 终端面板、白名单管理
- 使用系统 Chrome(channel: 'chrome')避免浏览器下载

### Benchmark 能力评估(真实 LLM)
- full 模式: 60% 准确率(5 用例 3 通过 2 超时)
- fast 模式: 100% 准确率
- 失败用例: llm-001 (intent_understanding) / llm-004 (code_generation) 均为超时

### 单元测试
- 174 个新测试通过
- 28 个预存失败(非本次架构变更引入)

## 代码修复

### chat.ts: 消除 any 类型 TODO(line 406)
- handleWsMessage 参数从 Record<string, any> 改为 WsServerMessage 联合类型
- 使用判别联合窄化,每个 case 分支直接访问类型化字段
- 移除通用 payload 变量,移除未使用的类型导入
- vue-tsc --noEmit 零错误

### 基础设施修复
- playwright.config.ts: 修复 PROJECT_ROOT 路径(4 级而非 2 级)
- playwright.config.ts: 用 uvicorn.run() 替代 agentkit serve(避免非 tty 交互提示)
- helpers.ts: API_BASE 改为绝对 URL(Node.js fetch 不支持相对 URL)
- helpers.ts: clearAuth 修复 page.evaluate 上下文问题(Node 常量传入浏览器)
- helpers.ts: loginViaApi 添加 429 限流重试 + token 缓存
- login.spec.ts / terminal.spec.ts: 修复 Ant Design Vue autoInsertSpace 导致的选择器不匹配
- chat.spec.ts: .first() 改 .last() 避免拾取历史消息
- setup-test-user.py: .local 邮箱改为 .com(EmailStr 拒绝 .local TLD)
- .gitignore: Playwright 产物路径限定到 frontend 目录

### 依赖
- pyproject.toml: 补充 pyjwt, bcrypt, aiosqlite 依赖
- package.json: 添加 @playwright/test 依赖

## 未完成计划清单(核对结果)

### 计划 001(聊天主区 VI 重梳)— active
- U7: SkillsTab/SystemTab/KnowledgeTab 三子组件未实现
- U8: Preview 样例场景精修未完成
- U9: BoardMeetingModal VI 适配收尾未完成
- U10: 质量门与后端回归测试未完成

### 计划 002(企业级 C/S 架构)— 方案评审中
- 8 个待决策问题未明确(卖给谁/部署位置/终端形态等)
- P2/P3/P4 模块延后

### 计划 003(企业级 C/S 演进)— completed
- 7 项 Deferred(Web 管理台/技能市场/SSO/代码索引/多租户等)

### 代码 stub
- DockerComputerUseSession: start/stop/screenshot/execute_action 4 个方法为 stub
  (需真实 Docker + VNC + Anthropic Computer Use API,属未来功能)
2026-06-20 18:22:10 +08:00
chiguyong 91f56ca663 feat: 企业级客户端-服务端架构 + 代码审查修复
## 主要变更

### 新增功能
- 企业级客户端-服务端架构(JWT 认证 + RBAC 权限 + 终端安全)
- Tauri 桌面客户端与服务端配置同步
- 远程 LLM 网关(RemoteLLMProvider,支持 401 token 刷新重试)
- 服务端终端 WebSocket(带管理员审批流程)
- 终端白名单六层防御(黑名单 → shell 操作符检测 → 内置安全 → 全局/用户/会话白名单 → 危险检测)

### 代码审查修复(P0/P1/P2)
- P0: 危险二进制(rm/docker 等)不再加入白名单,compute_whitelist_entry 返回 None
- P1: 终端审批所有权追踪(_approval_owners dict)+ 会话清理防泄漏
- P1: 本地终端 WebSocket URL 补齐 JWT token
- P1: 审计日志支持 terminal_mode 过滤
- P1: /system/resources 端点强制 SYSTEM_CONFIG 权限
- P1: RemoteLLMProvider 增加 401 token 刷新重试机制
- P1: auth/models.py 使用 Mapping[str, object] 替代 Any 类型
- P2: 终端授权依赖检查 is_active 账户状态
- 修复 app.py 未使用的 APIKeyAuthMiddleware 导入

### 文档更新
- README.md: 新增第 16 章「企业级客户端-服务端架构」
- AGENTS.md / CLAUDE.md: 同步模块映射、路由表、前端页面
- 计划文档标记为 completed

Closes: docs/plans/2026-06-19-003-feat-enterprise-client-server-evolution-plan.md
2026-06-20 06:48:18 +08:00
chiguyong 771756814f fix(review): 修复代码审查发现的 P0/P1/P2 问题
P0 (Critical):
- orchestrator: plan_update 事件 key 从 phases 改为 plan_phases 匹配前端契约
- orchestrator: team_formed 事件 payload 从 string[] 改为 IExpertInfo[] + plan_phases:[]

P1 (High):
- orchestrator: 新增 phase_failed 事件广播 (3处: gather 失败/_execute_phase 异常/_mark_dependents_failed 级联)
- orchestrator: 新增 team_dissolved 事件广播 (3处: 正常完成/ValueError/Exception)
- orchestrator: _mark_dependents_failed 改为 async 以支持事件广播
- orchestrator: gather 结果检查增加 asyncio.CancelledError (Python 3.11+ BaseException)
- plan: PhaseStatus.RUNNING 值从 running 改为 in_progress 匹配前端联合类型
- team.ts: updatePhaseStatus 增加 plan_phases undefined 防御守卫
- chat.py: 增加 asyncio.CancelledError 处理 + team.dissolve() 移入 finally 块

P2 (Medium):
- orchestrator: _get_isolated_agent 返回类型 Any 改为 ConfigDrivenAgent
- orchestrator: _get_llm_gateway 返回类型 Any 改为 LLMGateway | None
- orchestrator: 依赖输出从 SharedWorkspace 读取改为内存 dep_phase.result (减少冗余 I/O)
- plan: PlanPhase.to_dict() result 序列化为 string 匹配前端 ITeamPlanPhase.result 类型
- types.ts: expert_step.step 类型从 number 改为 string (后端发送 phase ID)

Tests: 377 passed (experts + chat_team + expert_team)
2026-06-18 13:00:59 +08:00
chiguyong 871e20876f test(integration): U9 重写集成测试覆盖流水线模式
- 33 个测试覆盖 F1-F16 全部场景

- F1: 手动团队组建 (@team:expert1,expert2)

- F2: 默认团队模板 (@team:dev_team)

- F3: 流水线串行执行 (3阶段 A→B→C)

- F4: 并行阶段执行 (无依赖)

- F5: 阶段失败和依赖失败传播

- F6: SharedWorkspace 数据传递

- F7: 上下文隔离 (独立 ConfigDrivenAgent)

- F8: 事件序列验证 (team_formed → plan_update → phase_started → phase_completed → team_synthesis)

- F9: TeamStatus.PLANNING 状态流转

- F10: 循环依赖检测

- F11: 无效专家引用 fallback

- F12: LLM 分解失败 fallback

- F13-F16: 去中心化协作、用户干预、团队解散、动态专家管理
2026-06-18 02:26:59 +08:00
chiguyong 1e818b507d feat(server): U6 新增 _execute_team_collab 集成 @team 流水线到 WebSocket 2026-06-18 02:08:29 +08:00
chiguyong ee6d16345c feat(experts): U7 新增 5 个编程专家模板 + dev_team 团队模板 + ExpertTeamRouter 模板展开 2026-06-18 01:50:43 +08:00
chiguyong 0f8ea6e21e feat(experts):重写 TeamOrchestrator 为流水线模式 + TeamStatus.PLANNING 2026-06-18 01:39:22 +08:00
chiguyong 1075598ebf feat(experts):恢复 plan.py 阶段依赖图 (PlanPhase + topological_sort)
- 新增 PhaseStatus 枚举 (PENDING/RUNNING/COMPLETED/FAILED)
- 新增 PlanPhase 数据类 (id/name/assigned_expert/task_description/depends_on/status/result)
- TeamPlan 新增 phases 字段及配套方法: get_phase/update_phase_status/topological_sort/get_ready_phases
- topological_sort 使用 Kahn 算法返回执行层 (list[list[PlanPhase]]),检测循环依赖
- 保留 SubTask/MergeStrategy 向后兼容
- 新增 54 个单元测试覆盖线性/并行/循环依赖、无效引用、就绪阶段、序列化
2026-06-18 01:28:18 +08:00
chiguyong 28ca5b6001 fix(experts):修复 ExpertTeamRouter 模板引用 bug + 修复损坏的集成测试
U1: resolve_expert_configs 中使用 copy.deepcopy(template.config) 替代直接引用,
防止 is_lead 赋值污染共享模板(与 BoardRouter 的 P1 修复保持一致)。

U2: 移除 test_expert_team.py 中对已移除类的导入(CollaborationPlan, MergeStrategy,
ParallelType, PhaseStatus, PlanPhase),删除使用这些类的测试。保留不依赖已移除类
的 8 个测试。U9 将重写为流水线模式测试。
2026-06-18 01:23:25 +08:00
chiguyong dddcbd24e3 feat: 私董会讨论模式 + 回测集成 + WS持久化修复
私董会讨论模式 (Board Meeting Mode):
- BoardRouter: @board 前缀路由, 专家名验证, 模板回退
- BoardTeam: 讨论容器, 状态机 (FORMING->DISCUSSING->CONCLUDING->COMPLETED)
- BoardOrchestrator: 多轮自主循环讨论引擎, 主持人小结, 停止命令检测
- 9个预设名人专家 YAML (马斯克/贝佐斯/张小龙/芒格等)
- 前端 BoardStatusView 群聊式 UI + WebSocket 事件处理
- 后端 chat.py 集成 @board 路由到主聊天流程

回测集成:
- benchmark.py: 新增 board_meeting 维度 (18 tasks, 6 categories)
- benchmark_dataset.py: 新增 BOARD_BENCHMARKS (11 E2E cases)
- test_board_backtest.py: 66 个回测测试 (9 test classes)

Bug 修复:
- resolve_expert_configs: deep-copy 防止 is_lead 修改污染共享模板
- 所有专家名无效时回退到默认模板
- board_router: 非匹配路径 topic 未 strip
- benchmark_dataset: board-name-invalid-001 输入修正

WebSocket 持久化修复:
- chat.py: 三层防御机制确保任务结果不丢失
- chat store: 断线恢复逻辑

部署配置:
- Gitea Actions CI/CD workflow
- docker-compose.deploy.yaml 部署编排
- scripts/deploy.sh 自动化部署脚本

测试结果: 120 单元测试通过, 71 benchmark 测试 100% 通过, ruff 全部通过
2026-06-17 23:52:53 +08:00
chiguyong 5b5291c7e5 fix: WebSocket task persistence three-layer defense with security hardening
Fix chat history empty content and task stops on refresh. Implements: result persistence on disconnect, task backgrounding via asyncio + EventQueue, frontend reconnection recovery. Security: fail-closed conversation_id ownership, asyncio.shield on CancelledError cleanup, async TaskStore shim, EventQueue subscriber limit, connection error resilience. 23 tests added.
2026-06-17 22:11:51 +08:00
chiguyong 1fbfd9d132 refactor: standardize benchmark with industry methodology (P/R/F1, multi-run, baseline) 2026-06-17 12:01:34 +08:00
chiguyong d00995504d feat: comprehensive capability benchmark and agentkit benchmark CLI 2026-06-17 11:28:09 +08:00
chiguyong ecf87391a5 feat: integrate SQ/EQ into portal WebSocket and CLI (Phase 4)
- app.py: initialize EventQueue + SubmissionQueue in app.state, close on shutdown
- portal.py: emit unified events (task.created/started/completed/failed,
  turn.thinking/tool_call/tool_result/final_answer) to EQ alongside WebSocket messages
- cli/chat.py: optional --event-queue flag for event emission
- EQ is bypass-only: emit failures never affect WebSocket or CLI main flow
- WebSocket message format unchanged (backward compatible)

Tests: 650 passed, 0 failed, 4 skipped
2026-06-17 11:05:04 +08:00
chiguyong bbedfff597 feat: hub-and-spoke experts, tiered tool injection, unified event model (U3/U7/U10) 2026-06-17 10:46:16 +08:00
chiguyong 200174c5c7 feat: SQLite persistence, verification loop, spec-driven execution
Phase 2 of architecture optimization (U5/U6/U9):

- U5: SqliteConversationStore with WAL mode + LRU cache (1000 convs)
  Replaces in-memory ConversationStore in portal.py
  Data survives server restarts (ref: Codex Thread persistence)
- U6: VerificationLoop with verify/verify_and_retry
  Default commands: pytest + ruff check
  ReActEngine integration via verification_enabled flag
  New run_tests tool for LLM to invoke verification
- U9: SpecManager for plan-as-contract (ref: Qoder Quest Mode)
  Plans persisted to .agentkit/specs/{spec_id}.yaml
  API: GET/PUT /api/v1/specs, POST /api/v1/specs/{id}/confirm
  PlanExecEngine emits spec_created event after plan generation

Also fixes: portal skill_name routing, app.py SessionManager guard,
test_telemetry CostAwareRouter removal, test_compression_config fixture
2026-06-17 10:45:20 +08:00
chiguyong 5374bc8501 refactor: eliminate routing layer, align with industry best practices
Phase 1 of architecture optimization (U1/U2/U4/U8):

- U1: Rename SimpleRouter to RequestPreprocessor, route() to preprocess()
  Eliminates misleading routing concept; LLM decides autonomously
  in REACT agent loop (matches Codex/Claude Code/Trae pattern)
- U2: Delete CostAwareRouter, HeuristicClassifier, SemanticRouter
  (~700 lines removed). skill_routing.py: 1688 to 220 lines
- U4: PlanExecEngine defaults to ReActStepExecutor, delete _LLMStepExecutor
  (pure LLM calls without tools = no execution capability)
- U8: ReActEngine defaults to ContextCompressor(keep_recent=10)

Supersedes plans 2026-06-15-002/003/004.
New plan: 2026-06-16-006-refactor-architecture-optimization-evolution-plan.md
2026-06-17 10:44:40 +08:00
chiguyong c4257591d4 refactor(router): replace CostAwareRouter with SimpleRouter and prompt-based tool calling 2026-06-16 03:31:05 +08:00
chiguyong a27eed3714 fix(config): unify config loading chain and protect ${VAR} references
- Settings API: reverse-resolve env vars to preserve ${VAR} refs in yaml,
  write new API keys to .env instead of agentkit.yaml, extract env_key
  from existing ${VAR} reference when updating providers
- Onboarding: merge-update instead of overwrite when config exists,
  use config_arg to determine output path, .env merge instead of overwrite
- Unified templates: bailian-coding provider name, full model_aliases,
  docker-compose with postgres, expanded .env.example
- Optional ruamel.yaml for comment/format preservation in Settings API
- clients.yaml: add _deep_resolve for ${VAR} env var references
- All CLI commands use load_config_with_dotenv() consistently
- Tests: mock find_config_path and CWD auto-discovery to avoid env leaks
2026-06-16 00:26:54 +08:00
chiguyong 11e2009cb8 feat(router): improve colloquial/mixed-lang routing, fix low-complexity IntentRouter bypass
Key improvements:
- Low-complexity queries (<0.3) now try IntentRouter keyword match
  before falling back to DIRECT_CHAT, fixing 0% F1 on keyword_match
- SemanticRouter similarity_low lowered from 0.6 to 0.4
- Short text (<20 chars) uses effective_low = max(0.25, low - 0.15)
- Short text with no semantic match forces LLM classify fallback
- Added colloquial keywords to 7 skill YAMLs
- Fixed code_reviewer.yaml output_schema placement
- Fixed SemanticRouter build in e2e tests
- Fixed base_url detection for bailian-coding API keys

Results: keyword_match F1 0->60.87%, colloquial F1 0->100%, mixed_lang F1 0->100%
2026-06-15 23:54:57 +08:00
chiguyong fa2a6dece2 feat(router): enable SemanticRouter + upgrade benchmark to L3/L5
- Enable SemanticRouter in agentkit.yaml (router.semantic.enabled: true)
- Integrate SemanticRouter into e2e backtest (_build_real_components)
- Add 8 new semantic test cases: 5 colloquial + 3 mixed-lang expressions
- Add L3 output quality evaluation framework (LLM-as-Judge, 1-5 score)
- Add L5 adaptive capability metrics (consistency rate from overfitting data)
- Add OutputQualityObservation model and evaluate_output_quality() method
- Report now includes L3 and L5 sections

Results: 52 tests pass, description_match F1=66.67%, L5 adaptive rate=100%
2026-06-15 23:02:47 +08:00
chiguyong e984b4c462 feat(router): optimize routing intelligence — ExecutionMode expansion, multi-candidate scoring, quality gate skill match
- Expand ExecutionMode enum with REWOO/REFLEXION/PLAN_EXEC
- Add _resolve_execution_mode() to respect skill.config.execution_mode
- Rewrite IntentRouter._match_keywords() for multi-candidate scoring
- Add QualityGate 5th dimension: skill_match validation with warning escalation
- Calibrate HeuristicClassifier: low-complexity signals only when no high signals
- Fix negation regex for Chinese text (avoid matching past punctuation)
- Fix backtest mode_map normalization and .env loading
- Add 61 unit tests (21 HeuristicClassifier + 14 IntentRouter + 13 QualityGate + 13 existing)

Results: execution_mode_accuracy 9.09%→36.36%, skill_routing_F1 66.67%→77.78%
2026-06-15 22:43:13 +08:00
chiguyong 64d62a2b60 feat: autonomous task execution - connect PlanExecEngine + TeamOrchestrator
U1: TeamOrchestrator._execute_phase real execution (Expert.agent.execute)
U2: LLM-based merge strategies (BEST/VOTE/FUSION) with fallback
U3: ReActStepExecutor replacing _LLMStepAgent for tool-enabled steps
U4: SharedWorkspace integration for cross-phase/cross-execution state
U5: GoalPlanner prompt tuning with few-shot and verb pattern matching
U6: Replan-before-fallback in TeamOrchestrator
U7: End-to-end validation tests for multi-step research tasks
U8: WebSocket progress events (step_event_callback + new event types)

Code review fixes: P0 response.strip fix, P1 competitor status check,
milestone real impl, VOTE self-bias fix, confirmation_handler wiring,
ExpertTeam public API, DRY _build_result_summaries, replan tests

Also: geo_server.py refactor (ServerConfig.from_yaml), delete llm_config.yaml
2026-06-15 12:41:32 +08:00
chiguyong 99fe4c99f7 fix: comprehensive code review fixes + WS test stability 2026-06-15 08:17:34 +08:00
chiguyong 7384ecb03e feat: Expert Team Mode — plan-execute collaboration with conversation UI
Implements B+C hybrid Expert Team Mode with ExpertConfig, CollaborationPlan,
TeamOrchestrator, ExpertTeamRouter, HandoffTransport, SharedWorkspace, and
Expert wrapper. Frontend includes ExpertTeamView, ExpertMessage,
PlanVisualization, team store, and WS event handlers.

Code review fixes: sentinel-based close, per-phase retry, name validation,
Vue component integration, teamState dedup, Redis reset, plan reassign,
event_type validation, hmac timing-safe compare, message dedup,
reactive updatePhases, O(1) phase lookup, iterative DFS, bounded Queue.

232 unit tests passing.
2026-06-14 22:20:14 +08:00
chiguyong 94c4c8b887 feat: accumulated frontend enhancements, docs, and static assets
- Frontend view updates (ChatView, EvolutionView, SkillsView, etc.)
- Updated portal routes and chat store
- New frontend components (FilePreview, ToolCallCard, IconNav)
- Updated static build assets
- New test files (merged router, parallel tools, ReWOO fallback)
- Documentation and brainstorm files
- Codegraph and understand-anything artifacts
2026-06-14 16:35:01 +08:00
chiguyong 6e0e081f23 feat: gap closure sprint — dark theme, @-mention, LocalComputerUse, tests
P0: U4 UsageStore + U5 CascadeStateStore independent test files (57 tests)
P1: Dark theme — tokens.css [data-theme="dark"] + theme.ts Pinia store
    + TopNav toggle button + App.vue dynamic Ant Design theme
P1: @-mention — MentionDropdown.vue + /skills/mention-suggest API
    + ChatInput integration with @ detection
P2: LocalComputerUseSession — pyautogui + screencapture (replaces Docker stub)
P2: Integration tests for gap closure (12 tests)
Fix: create_cascade_state_store() now passes session_ttl to InMemory fallback
2026-06-14 16:16:50 +08:00
chiguyong 0ccef7be5c feat: P0 production hardening — LLM cache, semantic routing, state persistence
U1: LLM Cache Core (exact + semantic match, InMemory + Redis backends)
U2: Cache integration into LLMGateway with CacheConfig
U3: Semantic Router as Layer 1.5 in CostAwareRouter
U4: UsageStore persistence (Redis Hash + InMemory fallback)
U5: CascadeStateStore persistence (Redis INCR + InMemory TTL)
U6: EvolutionStore interface unification (Protocol + PostgreSQL backend)
U7: Configuration integration + E2E tests

Code review fixes:
- P0: date iteration bug (day>=28), semantic router index never built,
      Redis connection leak (per-call → persistent pool)
- P1: cache degradation recovery, semantic_search degradation,
      double miss counting, asyncio.Lock for PG init, LIMIT on queries,
      __import__ anti-pattern → _utcnow()
- P2: InMemory TTL cleanup, embedding preservation on put(),
      data TTL = max(exact_ttl, semantic_ttl)
2026-06-14 15:16:00 +08:00
chiguyong 09698d7a06 feat: frontend productization with code review fixes
- Workflow: visual canvas, undo/redo, drag-and-drop, real-time execution WebSocket
- Evolution: dashboard, ECharts metrics, experience timeline, pitfall warnings, usage panel
- KB: source CRUD, document upload, search test
- Terminal: interactive PTY WebSocket, whitelist security
- Security: hmac.compare_digest, API key auth on all endpoints, whitelist bypass fix
- Fixes: ECharts async init, WebSocket intentional disconnect, TOCTOU race, Pydantic models
2026-06-13 01:29:58 +08:00
chiguyong 5ef08a3b30 fix(review): comprehensive P0-P2 code review fixes 2026-06-12 22:18:25 +08:00
chiguyong a36bc3d1c1 feat: optimize chat response speed for sub-1s first token latency
- Add HeuristicClassifier to replace LLM quick_classify with zero-cost
  local heuristic (keyword/length/code-pattern scoring), gated by
  router.classifier config (default: heuristic)
- Add parallel tool execution in ReActEngine via asyncio.gather for
  multiple independent tool_calls, gated by parallel_tools param
- Add AsyncWriteQueue for non-blocking session persistence with WAL
  buffer, gated by async_writes param on SessionManager
- Add httpx.Limits connection pool config to all LLM providers
- Add router config section to ServerConfig and agentkit.yaml
- All optimizations have config switches for safe rollback
2026-06-12 13:15:06 +08:00
chiguyong 8c365486e2 fix(pipeline): address code review findings for adversarial loop
Critical:
- C1: Add verifier_timeout_seconds for independent Verifier timeout
- C2: Verifier parse failure raises RuntimeError instead of dead-loop

Major:
- M1: Inject previous_output into Worker retry context
- M2: Add Pydantic ge/le constraint on ReviewFeedback.score
- M3: Use Literal type for feedback_mode enum validation
- M4: Use Literal types for ReviewIssue severity and category
- M5: Merge error messages when escalation agent also fails

Tests: 8 new test cases added (19 total), all passing
2026-06-12 10:02:37 +08:00