Commit Graph

392 Commits

Author SHA1 Message Date
Fischer 8633f60831 feat: complex-task-quality-loop (R1-R12) — 11 P1 blockers fixed (#22)
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
Merge feat/complex-task-quality-loop into main.

Includes U1-U9/R1-R12 implementation + 11 P1 blocker fixes from ce-code-review.

P1 fixes: trace_outcome propagation, portal execute_stream routing, network_block reentrancy, spec review gate wiring, max_reflections threading, phase budgets, plan aggregation, failure status mapping, evolution drain timeout, portal spec_review_reply, spec_review persistence.
2026-07-05 22:31:21 +08:00
chiguyong e5e76697a9 fix(review): resolve 11 P1 blockers from ce-code-review
Test / backend-test (pull_request) Waiting to run Details
Test / frontend-unit (pull_request) Waiting to run Details
Test / api-e2e (pull_request) Waiting to run Details
Test / frontend-e2e (pull_request) Waiting to run Details
P1#1  config_driven: propagate trace_outcome into output_data so
      lifecycle._is_failure_path() detects non-success outcomes
P1#2  portal: route through ConfigDrivenAgent.execute_stream (not
      react_engine.execute_stream directly) so evolution hooks fire
      and trace_outcome propagates; add pre-built messages support in
      _build_llm_messages
P1#3  sandbox: make network_block reentrant via module-level reference
      counter + threading.Lock - concurrent VERIFICATION phases no
      longer permanently block all new connections
P1#4  chat: replace dead isinstance(_PlanExecEngine) check with
      hasattr(_spec_review_handler) to wire the spec review gate
P1#5  plan_exec_engine: complete max_reflections threading chain
      (PlanExecEngine + ReActStepExecutor constructors)
P1#6  plan_exec_engine: enforce phase budgets (max_steps from
      phase_budgets, not hardcoded 5)
P1#7  plan_exec_engine: use current plan (not stale plan var) in
      aggregation after replan
P1#8  plan_exec_engine: map failure to failed status (not success)
P1#9  app: add drain timeout for pending evolution tasks on shutdown
P1#10 portal: handle spec_review_reply in WS handler
P1#11 chat: persist spec_review_request/reply/timeout to conversation
      store so reload can reconstruct gate state

Tests: 116 related tests pass; 26 pre-existing failures unchanged
(stash-verified). ruff lint clean.
2026-07-04 01:10:01 +08:00
Fischer 454a50b5a8 feat: Bitable P0 UX Polish + Agent Parity (#23)
Deploy to Production / deploy (push) Failing after 7s Details
Test / backend-test (push) Has been cancelled Details
Test / frontend-unit (push) Has been cancelled Details
Test / api-e2e (push) Has been cancelled Details
Test / frontend-e2e (push) Has been cancelled Details
Merged via LFG pipeline after final ce-code-review (Ready to merge) + ce-compound documentation.
2026-07-04 01:05:04 +08:00
chiguyong 826b766af0 docs(solutions): record bitable agent tool parity patterns + final review findings
Add docs/solutions/architecture-patterns/bitable-agent-tool-parity-patterns.md
capturing three architecture patterns from U6 (R15a):
- Dual-sync action registration (KTD10): handlers dict + input_schema.enum
- 404-before-403 ownership check (KTD9): prevent existence leak via DELETE
- 409 last-view protection: prevent invalid zero-view table state

Update residual findings with DR-4 (TOCTOU race in delete_view) and DR-5
(_update_field silent type drop) surfaced in final pre-merge ce-code-review
pass. Both P2, neither blocks merge. Documented in the solutions doc under
Known Limitations with concrete fix paths.
2026-07-04 01:04:46 +08:00
chiguyong 3fdd29d152 build(bitable): rebuild frontend index.html for JS hash alignment
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
Rebuild index.html after U1-U6 frontend changes. JS bundle hash updated
(index-CHtvprqX.js -> index-agwA6wam.js) to match new build output.
CI runs unit/e2e only and does not rebuild static assets, so the committed
hash must match the bundled JS.
2026-07-04 00:39:24 +08:00
chiguyong 0f4a418408 docs(review): record residual review findings for feat/bitable-enhancement 2026-07-04 00:29:02 +08:00
chiguyong 137bda0361 refactor(bitable): simplify code after ce-simplify-code pass
Behavior-preserving simplifications (net -22 lines):
- useResponsiveBreakpoint: remove createHandler factory, share single sync fn
- RecordDetailDrawer: remove isEditable wrapper, call isFieldEditable directly
- ViewConfigPanel: merge duplicate saveGrouping/saveConditionalFormat into saveU5Config
- groupingRulesUtils: use Array.find instead of for-loop, simplify Number.isFinite check
- GroupingEditor: simplify filter callback to single-expression arrow

Verified: typecheck + build:frontend + ruff all pass.

Refs: ce-simplify-code (LFG Step 3)
2026-07-04 00:28:28 +08:00
chiguyong 229dc0b2f3 feat(bitable): U6 R15a BitableTool 4 new actions + DELETE /views endpoint
Extend BitableTool from 6 to 10 actions (create_view, update_view,
update_field, delete_view) and add the DELETE /views/{view_id} backend
endpoint with 404-before-403 ownership, 409 last-view protection, and
X-Internal-Token passthrough (KTD11).

Backend:
- repository.py: add delete_view() — DELETE row by view_id, returns rowcount > 0
- service.py: add LastViewDeletionError domain exception + delete_view()
  with last-view guard (siblings <= 1 → raise → route maps to 409)
- routes/bitable.py: add DELETE /views/{view_id} (204 No Content),
  404-before-403 ownership pattern, 409 on LastViewDeletionError,
  X-Internal-Token passthrough via require_bitable_auth
- tools/bitable_tool.py: add 4 new actions (_create_view, _update_view,
  _update_field, _delete_view), register in BOTH handlers dict AND
  input_schema.action.enum (KTD10 — 10 actions each)

Frontend:
- api/bitable.ts: add deleteView(viewId): Promise<void>
- stores/bitable.ts: add deleteView action — removes from local state,
  switches to first remaining view if active was deleted, 409 warning
- ViewSwitcher.vue: add delete button (a-popconfirm "确认删除此视图?"),
  hidden when views.length <= 1 (preempt last-view 409)
- BitableFileDetailView.vue: handle @delete event from ViewSwitcher

Tests:
- test_routes.py: 6 new DELETE /views tests (204, 404 missing, 404
  non-owner, 409 last-view, internal-token passthrough, internal-token 404)
- test_bitable_tool.py: 13 new tests (action count = 10, handlers = 10,
  4 action happy paths, missing-field errors, 409 last-view, R3/R4
  config parity, X-Internal-Token passthrough on all 4 new actions)
- e2e/bitable-agent-parity.spec.ts: 10 scenarios (P1-P10) covering
  delete button visibility, popconfirm, 204/409/404 flows, tab removal,
  view switch after delete, create view adds tab

Verification:
- ruff check: all files pass
- pytest: 62 passed, 12 pre-existing failures (unchanged from e931fbe baseline)
- typecheck: pass (EXIT_CODE=0)
- build:frontend: pass (BUILD_EXIT=0)
- action count: ENUM=10, HANDLERS=10, delete_view in both
- no blue hex colors in ViewSwitcher.vue

Pre-existing test failures (12, unchanged from e931fbe):
test_create_table_success, test_create_field_success, test_list_fields,
test_create_records_batch, test_upsert_inserts_then_updates,
test_upsert_preserves_user_columns, test_create_view_success,
test_batch_upsert_1200_records, test_resume_from_partial_failure,
test_query_records, test_query_records_with_limit, test_collect_api

Constraints honored:
- No emojis, no `any` type, no blue hex colors, no pyproject.toml changes
- 404-before-403 for non-owned resources (Pattern 4)
- X-Internal-Token transparent passthrough (KTD11)
- KTD10: actions registered in both handlers dict AND enum
2026-07-03 23:13:46 +08:00
chiguyong 7c900ce280 docs: add complex-task-quality-loop plan and requirements documents
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
Adds the brainstorm requirements and implementation plan that drove the
9-unit quality-loop feature (R1-R12). Also gitignores local worktree
directories.
2026-07-03 22:54:11 +08:00
chiguyong e931fbef2d feat(bitable): U5 R4 grouping (max 3 fields) + conditional formatting (7 operators)
- GroupingEditor: multi-select field picker (max 3), per-level direction
  toggle, reorder buttons, "已知限制:不支持跨分组多选" note, empty state
- ConditionalFormatEditor: per-rule enable/field/operator/value/color/bold,
  8 color keys, WCAG 1.4.1 bold default true, first-match-wins footer legend
- BitableGrid: unified section rendering (grouped/ungrouped via single
  vxe-grid declaration), group headers as separate divs (CF only on data
  cells), CF via row-config.className, multi-grid instance map for refresh
- groupingRulesUtils: pure functions for CF matching (7 operators), group
  tree builder, SUM/AVG aggregation, CSS var mappers, self-check on load
- view_config.py: Pydantic v2 validation (MAX_GROUP_BY_FIELDS=3, 7
  operators, 8 color keys, extra="forbid" on sub-models)
- routes/bitable.py: validate_view_config on PATCH (HTTP 422 on error)
- stores/bitable.ts: updateViewConfig action (merges U5 sub-keys, preserves
  filters/sort/hidden_fields)
- ViewConfigPanel: grouping + conditional-format tabs
- E2E: 8 scenarios (G1-G8: single/multi grouping, collapse/expand, CF
  equals/between, combined, aggregation)
- Tests: 54 unit tests (19 grouping + 35 CF), 2 PG-marked skipped
2026-07-03 22:33:18 +08:00
chiguyong ffb7a51d77 fix(review): wire pitfall_detector/spec_review to PlanExecEngine + fix restore_budget_state reset order 2026-07-03 22:05:51 +08:00
chiguyong f280627da1 feat(bitable): U4 view type switcher with 5 types (grid enabled, others disabled)
- Add viewSwitcherUtils.ts (5 view types metadata: label/icon/disabled/tooltip)
- Refactor ViewSwitcher: button -> dropdown with 5 types, disabled items show "规划中" tooltip
- Update BitableFileDetailView.handleCreateView to accept viewType parameter (no more hardcoded grid)
- Bind :creating=viewCreating to ViewSwitcher for loading/disabled state during POST
- Extend store createView + API createView to pass view_type field (already in prior commits)
- Add loading/disabled state on create button to prevent duplicate clicks
- Extend e2e/bitable-view.spec.ts with 5 view type scenarios (E1-E5)

Closes R3 (P0): view type selection in UI, backend already supports view_type.

Refs: docs/plans/2026-07-03-001-feat-bitable-p0-ux-and-agent-parity-plan.md U4
2026-07-03 21:43:51 +08:00
chiguyong f1f2e72cad fix(plan_exec): add pitfall_warnings param for ReActEngine interface compat 2026-07-03 21:39:27 +08:00
chiguyong 5baaeb489d feat(bitable): U3 record detail drawer with full field type rendering
- Add RecordDetailDrawer.vue (480px/640px drawer, sticky header, full field type render)
- Add recordDrawerUtils.ts (value formatter, attachment/image extractors, drawer width calc)
- Add currentRecord state + openRecordDetail/closeRecordDetail/fetchRecordDetail actions to store
- Wire BitableGrid row click to open drawer
- Add e2e/bitable-record-drawer.spec.ts with 7 scenarios
- Loading/Error/404/empty states use U1 LoadingState/ErrorState per Open Question
- useResponsiveBreakpoint consumed: isMobile -> 100vw full-screen overlay
- user-owned fields editable, agent-owned fields read-only, upsert preserves agent columns

Closes R2 (P0): grid row click -> detail drawer with all field types visualized.

Refs: docs/plans/2026-07-03-001-feat-bitable-p0-ux-and-agent-parity-plan.md U3
2026-07-03 15:57:33 +08:00
chiguyong 120892e305 feat(chat): TEAM_COLLAB surfaces failure instead of silent REACT fall-back (U9, R7)
- chat.py: TEAM_COLLAB execution_mode sends error + returns (no REACT fall-back)
- REWOO/REFLEXION-as-mode keep deferred fall-back (RV10)
- AGENTS.md: update stale "not yet supported" claim
- Known gap: portal.py REST path still falls back (out of U9 scope)
2026-07-03 15:47:45 +08:00
chiguyong 786f921c5e feat(core): spec review gate - pause PLAN_EXEC for user review (U8, R8)
Add a spec review gate to PlanExecEngine that pauses execution after the
first Spec is generated, awaiting the user's confirm/reject decision.
On approval execution continues; on rejection the engine replans (capped
at 2 replans); on 30-min timeout the Spec is parked (not failed) so the
user can resume later.

- spec_manager: add parked status + park()/resume() methods
- plan_exec_engine: add spec_review_handler param, wire gate into both
  execute_stream and _execute_loop with replan cap, emit
  spec_review_request/spec_review_reply events, handle timeout to park
- chat.py: whitelist new events, add spec_review_reply WS handler,
  wire _spec_review_handler closure (30-min timeout), cleanup on disconnect
- portal.py: persist spec_review_id/decision/feedback for page reload
- tests: 20 unit tests covering happy path, rejection/replan, timeout,
  cancellation, backward compat, handler errors, park/resume round-trips
2026-07-03 15:20:38 +08:00
chiguyong f0c993a0d9 feat(bitable): U2 inline field configuration in column header menu
- Add InlineFieldConfigurator.vue (inline panel reusing FieldConfigForm logic)
- Add fieldRenderUtils.ts (type conversion compatibility check)
- Refactor ColumnHeaderMenu: edit -> inline expand, batch -> open FieldManagePanel
- Integrate InlineFieldConfigurator in BitableGrid header slot
- Add batch-management banner to FieldManagePanel
- Add submitting loading state to prevent duplicate clicks
- Extend e2e/bitable-field-ops.spec.ts with inline edit scenarios

Closes R1 (P0): column header menu inline edit, no more drawer jump.

Refs: docs/plans/2026-07-03-001-feat-bitable-p0-ux-and-agent-parity-plan.md U2
2026-07-03 15:12:17 +08:00
chiguyong e1cf073693 feat(bitable): U1 add design token system + vxe-table dependency declaration
- Add bitable-tokens.css with 4 token categories (color/spacing/radius/font/drawer-width)
- Add FieldTypeIcon.vue mapping 9 field types to Ant Design Outlined icons
- Add useResponsiveBreakpoint composable (768/1024/1440 breakpoints)
- Add LoadingState (skeleton) and ErrorState (inline alert + retry) components
- Token化 9 bitable components/views (replace hardcoded hex with var())
- Declare vxe-table dependency explicitly (resolve ghost dependency)
- Upgrade SelectDisplay chip palette to 8-color token with WCAG AA contrast

Phase 1 foundation for Phase 2 UX work (U2-U5).

Refs: docs/plans/2026-07-03-001-feat-bitable-p0-ux-and-agent-parity-plan.md U1
2026-07-03 14:40:57 +08:00
chiguyong a763396011 feat(evolution): pitfall retrieval/injection at planning phase (U7, R12) 2026-07-03 14:27:48 +08:00
chiguyong 91a61f9b49 feat(evolution): auto-trigger + quality gate + actor marking (U6, R5/R6)
U6 of the complex task quality loop plan.

R5 (auto evolution trigger + quality gate):
- EvolutionConfig (Pydantic v2): success_sample_rate=0.1, min_confidence=0.5,
  min_examples=3, observe_only=True, cross_workspace_sharing=False
- Success path gated by success_sample_rate; failure path always runs (100%)
- Observe-only mode records reflections without feeding optimizer (RV14:
  avoids noise-driven prompt degradation during initial rollout)
- PromptOptimizer.can_optimize() consumption gate: sample count >= min_examples
  AND mean quality >= min_confidence
- PitfallDetector confidence threshold: low-confidence warnings marked
  observe-only; confidence = failure_rate * min(1.0, total/3) linear ramp
  (ponytail: upgrade to Wilson interval)

R6 (actor marking + cross-workspace sharing):
- All evolution artifacts (EvolutionLogEntry, Module, PitfallWarning) carry
  actor field; defaults to result.agent_name
- can_share_artifact(): same-workspace always allowed; cross-workspace requires
  explicit opt-in via EvolutionConfig.cross_workspace_sharing=True

KTD-8: gave_up_after_reflections treated as failure path (triggers 100%
evolution) even when stream wrapper marks status as COMPLETED. Detection via
output_data.trace_outcome or error_message substring (ponytail: heuristic;
upgrade path is a dedicated TaskResult.trace_outcome field).

Backward compat: all gates conditional on auto_evolution_config is not None;
existing EvolutionMixin usage without config preserves prior behavior.

Tests: tests/unit/test_evolution_auto_trigger.py (37 tests) covers R5/R6
scenarios - sample rate gate, observe-only, consumption gate, pitfall
confidence, actor marking, cross-workspace sharing, gave_up_after_reflections,
error handling, fire-and-forget, backpressure cap, AE3 happy path.
2026-07-03 13:54:37 +08:00
chiguyong 96ccca3d87 docs(bitable-p0): add implementation plan for P0 UX polish + agent parity
ce-plan Deep plan (6 Implementation Units, 3 delivery phases):
- Phase 1: U1 R5 design token system + vxe-table dependency declaration
- Phase 2: U2-U5 R1-R4 frontend UX (inline field config, record drawer,
  view type switcher, grouping + conditional formatting)
- Phase 3: U6 R15a BitableTool 4 new actions + DELETE /views endpoint

11 KTDs covering: CSS token layer, vxe-table ghost dependency fix,
inline field configurator (hybrid vxe-table slot + custom component),
record detail drawer (single column 480/640px), view type dropdown
with disabled states, grouping + conditional format in View.config
with backend Pydantic validation, BitableTool action registration
(handlers dict + input_schema enum), X-Internal-Token ownership
semantics, 3-phase delivery with config schema freeze for parallel U6.

Phase 5.3 headless ce-doc-review (5 reviewers, 14 findings):
- Applied 2 safe_auto (U6 verification method, U5→U6 dependency)
- Applied 2 gated_auto (input_schema enum step, color_token→color_key)
- Applied 5 P1 manual fixes (backend config validation, X-Internal-Token
  ownership, grouping+CF combo state, LoadingState/ErrorState justification,
  R3/R4 backend assumption verification)
- 8 P2/P3 manual findings appended to Open Questions

Origin: docs/brainstorms/2026-07-03-bitable-comparative-evaluation-requirements.md
2026-07-03 13:49:57 +08:00
chiguyong f8927d1749 docs(bitable-eval): apply ce-doc-review best-judgment fixes (20 gated_auto + 12 manual)
ce-doc-review(7 reviewers, 39 raw findings → 32 actionable + 3 FYI 经合成管道),
用户选择"自动用最佳判断处理"路径。本提交应用全部 20 个 gated_auto 修复,并把
12 个 manual findings 追加到 Outstanding Questions 的 From 2026-07-03 review 子节。

主要修复:
- 修正 BitableTool 动作清单:实际为 create_table/import_excel/import_database/
  collect_api/upsert_records/query_records(原文 4/6 错),消除 R15a 范围误判
- R15a 从 B 线提升至 P0(4 reviewers 独立标记的优先级矛盾——B 线"non-blocking"
  与"agent 对等最高优先级子项"自相矛盾)
- G23 闭合路径标注(R15c 路径 (a)/(b))
- 默认字段类型未来 user/datetime 标注(Inventory + G6)
- R3 后端依赖标注(POST /views schema 扩展)
- 视图删除端点补 P0 验收标准(R15a 验收 + 前端 deleteView 方法)
- vxe-table 幽灵依赖标注(package.json 未声明,靠主仓 hoisting)
- create_field 动作标注为必需(R8 17 新类型需 agent 能批量建字段)
- R15 测试映射拆分为 R15a/R15b/R15c 三行
- R8 验收矩阵补 PII/XSS/auto-number 写保护列 + schema V3 迁移成本估算
- R15c 安全要求补 SSRF/认证/凭据加密 + 端点访问控制
- 横切验收标准补 WCAG AA 可访问性 + 空状态要求
- R8 矩阵范围标注(覆盖 P1,非 P0)

Open Questions 新增 12 个 manual findings(ce-plan 阶段决策):
- user 字段用户模型
- C 先行优先级策略的实证依据
- 并发编辑 UX 策略
- 加载/错误状态统一模式
- 条件格式规则构建器 UX 形态
- 分组交互细节
- 响应式断点定义
- R2 记录详情抽屉宽度
- vxe-table 容量上限评估
- R13 仪表盘图表库 buy-vs-build
- 禁用态视图类型路线图
- schema V3 双向关联回滚策略

文件:docs/brainstorms/2026-07-03-bitable-comparative-evaluation-requirements.md
(107 insertions, 31 deletions)
2026-07-03 13:32:07 +08:00
chiguyong 1d09fafec9 feat(core): reflexion in main flow - verify fail → reflect → retry (U5, R4) 2026-07-03 13:29:54 +08:00
chiguyong 4255cb33ba feat(core): step budget phases + keep working bias (U4, R11/R10) 2026-07-03 13:10:28 +08:00
chiguyong e9821a3b7f docs(bitable): add comparative evaluation requirements with ce-code-review P1 fixes
新增三向对比评估需求文档(agentkit bitable vs Twenty vs 飞书),并应用 ce-code-review
产出的全部 P1 缺口修复(共 9 项):

- P1-1: R8 字段类型计数对齐 16+1=17(KD6 与 R8 同步)
- P1-2: 新增 R8 字段类型验收矩阵(17 行表,含 V2->V3 迁移列)
- P1-3: KTD7 引用具体文件 formula/parser.py 替代裸引用
- P1-4: R-ID 命名空间冲突,加日期前缀 2026-06-29-R1..R5
- P1-5: created-time 统一为 datetime(通用类型 + 默认字段使用 datetime)
- P1-6: 新增 P0 验收标准段落(R1-R5 Given/When/Then)
- P1-7: 新增测试策略段落 + 测试文件映射表(R1-R5、R8、R15)
- P1-8: R15 拆解为 R15a/R15b/R15c + 新增 Agent 对等评估方法段落
- P1-9: R4 补充后端扩展(group_by/conditional_formatting schema)+ agent 对等说明

同时包含 2 项 gated_auto 修复:
- 组件计数 14 -> 15
- 移除文档中的全部 emoji,替换为 [OK]

ce-code-review run-id: 20260703-123134-c7c2b2ea
2026-07-03 12:59:41 +08:00
chiguyong b8418968c2 feat(core): verification defaults for PLAN_EXEC/TEAM_COLLAB + minimum sandbox (U3, R2/R3/RV3) 2026-07-03 12:32:22 +08:00
chiguyong dd259153fa feat(core): wire evolution hooks into execute_stream path (U2, OQ6 fix)
ConfigDrivenAgent.execute_stream() now fires on_task_complete/on_task_failed
evolution hooks in its finally block, achieving lifecycle parity with the
sync execute() path. This fixes the OQ6 gap where WebSocket-routed streaming
tasks bypassed evolution entirely.

Implementation:
- Module-level backpressure manager (_schedule_evolution / drain_pending_evolution_tasks)
  with cap = max(2, max_concurrency * 2), drop + log + counter on exceed, and
  shutdown drain via asyncio.gather(return_exceptions=True).
- _trigger_evolution_hooks / _evolve_safe methods on ConfigDrivenAgent: fire-and-forget
  via asyncio.create_task, evolution errors swallowed (never fail the stream).
- execute_stream finally block distinguishes cancelled (CancelledError /
  TaskCancelledError -> CANCELLED), failed (Exception -> FAILED), completed
  (final_answer received -> COMPLETED), and early-close (no completion, no
  error -> CANCELLED "stream closed before completion").
- app.py shutdown drains pending evolution tasks.
- plan_exec_engine.py / reflexion.py: doc comments noting hooks fire at the
  ConfigDrivenAgent layer (single chokepoint, no double-fire).
- portal.py: verification comments at 3 execute_stream call sites (these call
  react_engine.execute_stream directly, bypassing ConfigDrivenAgent - known gap
  tracked separately).

Tests (8 new in test_execute_stream_hooks.py):
- Happy path: success fires COMPLETED, failure fires FAILED.
- Edge cases: cancellation fires CANCELLED, early aclose fires CANCELLED,
  evolution error suppressed, backpressure cap drops + counts.
- Parity: REST on_task_complete vs execute_stream both fire COMPLETED.
- Disabled: _evolution_enabled=False fires no hooks.
2026-07-03 12:16:02 +08:00
chiguyong 2932ee51ed feat(tools): add str_replace_editor tool with workspace-root security (U1, R1)
Replaces the broken write_file placeholder (no real implementation, only
_FakeTool stubs in cli/benchmark.py) with a structured editor offering four
commands: create, str_replace, insert_at_line, view.

Security model (file-system analog of the 6-layer terminal security paradigm,
reject-by-default + prefix match):
  1. Reject absolute paths (force relative interpretation vs workspace root).
  2. Reject any .. path component (path traversal).
  3. Path.resolve() follows symlinks, then relative_to(workspace_root)
     rejects symlink escape and residual traversal.

Data-loss guard: create refuses to overwrite existing files. str_replace
requires a unique anchor (0 or >1 matches error). insert_at_line is 1-based
(0 = prepend, > EOF = append). All FS I/O wrapped in asyncio.to_thread.

Registers str_replace_editor in _DEFAULT_CORE_TOOLS (replacing write_file)
so its full description is always injected into the LLM prompt. Updates
test_tool_search.py which used write_file as a sample core tool.

Tests: 34 cases in test_str_replace_editor.py cover happy path, edge cases
(empty file, multi-match, insert at 0/beyond EOF, view range), error paths
(overwrite refusal, anchor not found, path traversal, absolute path, symlink
escape, unknown command, missing args), and integration contract (in
_DEFAULT_CORE_TOOLS, exported from agentkit.tools, schema enum, prompt
injection via _build_tool_use_prompt).

Verification: ruff check clean; targeted regression suite 412 passed
(the single failure in test_calendar_tool.py is a pre-existing date-sensitive
bug in an untouched file, today 2026-07-03 Friday makes the next-Wednesday
assertion fail).
2026-07-03 11:42:59 +08:00
Fischer 00b2dad36e feat(compressor): CJK-aware token estimation + linear compress flow (#21)
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
Squash merge PR #21: CJK-aware token estimation + linear compress flow + solution doc
2026-07-03 09:40:28 +08:00
Fischer 2296d0b209 refactor: remove all emoji from source code (#20)
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
Replace emoji/glyph characters with Ant Design Vue Outlined icons (frontend), text labels with ANSI colors (CLI/shell), and ASCII art (docstrings). Add pre-commit guard (scripts/check-no-emoji.sh) and style guide to prevent regression.

Closes: docs/plans/2026-07-02-001-refactor-remove-all-emoji-plan.md
2026-07-03 02:46:40 +08:00
Fischer 76c9c08756 feat(ui): private board restrictions + scheme B assistant/user bubbles (#19)
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
Implements U1-U4 from plan docs/plans/2026-07-02-001-feat-private-board-restrictions-and-scheme-b-bubbles-plan.md

U1: ChatInput @board button blocks existing-conversation board creation with modal
U2: BoardBannerCard simplified to plain title + round meta
U3: MessageShell assistant bubble (scheme B neutral grayscale) with F4-A card exclusion + G1 empty-bubble hide
U4: UserBubble dark text bubble for plain text

Code review fixes: P1 color token, P2 CARD_BEARING_TYPES error type, P2 expertColor dead code, P0/P1 bubbleUtils.ts + 42 tests

Tests: 180/181 pass (1 pre-existing tauri-auth failure). Typecheck clean.
2026-07-03 01:58:19 +08:00
chiguyong e04e2868c3 docs(compound): message bubble empty-content and card-type exclusion pattern
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
Documents the G1 (:empty never matches Vue root), F4-A (card-bearing type
exclusion via messageType prop + Set), and pure-function extraction pattern
for testability without @vue/test-utils.
2026-07-03 01:58:00 +08:00
chiguyong cc6634b2ab feat(ui): private board restrictions + scheme B assistant/user bubbles
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
U1: ChatInput @board button blocks existing-conversation board creation
    with modal — enforces "one board per conversation" constraint.
U2: BoardBannerCard simplified to plain title + round meta
    (no icons/bars/progress/expert chips).
U3: MessageShell assistant bubble (方案B neutral grayscale) with
    F4-A card-type exclusion + G1 empty-bubble hide.
U4: UserBubble dark text bubble for plain text
    (command card/file keep light bg).

Code review fixes (ce-code-review step 5):
- P1: UserBubble focus-visible --accent-primary → --color-primary
  (dark mode visibility fix).
- P2: CARD_BEARING_TYPES adds 'error' (ErrorCard double-bubble regression).
- P2: Remove dead expertColor prop (scheme B leftover).
- P0/P1: Extract bubbleUtils.ts pure functions + add 42 tests
  covering G1/F4-A/U4/U2 key decisions.

Tests: 180/181 pass (1 pre-existing tauri-auth failure unrelated).
Typecheck: clean.
2026-07-03 01:47:37 +08:00
chiguyong 981a794a54 docs(plan): private-board restrictions + scheme B bubbles plan ready
Plan document finalized after 4 rounds of ce-doc-review:
- F4-A exclusion list extended from 5 to 9 card-bearing types
- Verified root class names for all 9 card components
- Corrected chrome description (2 full chrome + 7 partial chrome)
- Added U1 modal focus restoration note (WAI-ARIA)
- Documented R4-DA1/R4-A3/R4-A4 as Open Questions for implementation
2026-07-03 01:14:37 +08:00
Fischer 6826ceb2a9 Merge PR #18: fix async generator mock for U3 streaming orchestrator
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
2026-07-02 22:57:16 +08:00
chiguyong 1599d193c7 test: fix async generator mock for U3 streaming orchestrator
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
U3 streaming refactor switched orchestrator from agent.execute() to
agent.execute_stream() (async gen), but tests still mocked execute().
AsyncMock() returns a coroutine lacking __aiter__, causing:
- 'async for' requires an object with __aiter__ method, got coroutine
- RuntimeWarning: coroutine was never awaited

Add shared helpers in tests/unit/experts/_helpers.py:
- make_chat_stream_mock: async gen for gateway.chat_stream
- make_execute_stream_mock: async gen yielding final_answer event
- make_execute_stream_raising_mock: async gen that raises (for failure tests)

Update 3 test files to use the helpers:
- test_team_orchestrator.py: _make_mock_expert, _make_mock_pool,
  failure tests (phase_failed, all_phases_fail, fallback_uses_lead,
  phase_failure_marks_dependents), assertion updates (execute_stream
  instead of execute), synthesizer warning cleanup
- test_pm_collaboration.py: _make_mock_expert, _make_mock_llm_gateway,
  collaboration/risk/rework assertions
- test_board_orchestrator.py: _make_mock_gateway (warning cleanup)

All 483 experts/ tests pass with 0 warnings.
2026-07-02 22:52:10 +08:00
chiguyong d17863d01d Merge PR #17: fix transient state reset + ReAct tool guidance + scheme B UI + dev port isolation
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
2026-07-02 22:25:52 +08:00
chiguyong 23160be055 fix(types): resolve 3 pre-existing typecheck errors in transient-state test
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
message_type: 'board_started' as const (line 93) fixes TS2322 on lines
107 and 122 — TypeScript was inferring message_type as string instead
of the literal 'board_started'.

boardState local variable: replace 'as never' with proper shape +
'status: discussing' as const (line 159-160) fixes TS2339 on line 168
where .topic was accessed on type 'never'.

All 5 transient-state tests still pass. vue-tsc --noEmit now clean.
2026-07-02 22:13:28 +08:00
chiguyong 53347ed1fe test(u6): add L4 real-LLM smoke test for ReAct tool-use prompt
Manual smoke test verifying U4 L0 prompt rule rearrangement under real
LLM calls (bailian-coding/qwen3.7-plus). 5 probe queries covering
external_info / realtime_data / multi_step / realtime_simple / no_tool.

Results:
- Probe #1 external_info: PASS (8 web_search calls, 99.9s)
- Probe #2 realtime_data: ERROR (120s timeout, not LLM refusal)
- Probe #3 multi_step: PASS (8 web_search calls, 62.6s)
- Probe #4 realtime_data_simple: PASS (3 web_search calls, 23.8s)
- Probe #5 no_tool_escape_hatch: PASS (0 tool calls, direct answer, 4.2s)

Verdict: 3/4 tool-call pass (>=3/4 threshold) + 1/1 direct pass
Bug 2 status upgraded to 'L4 verified'.

Plan Progress table updated: U6 done, U7 done.
2026-07-02 22:08:45 +08:00
chiguyong 44f4f1c46f fix: add null check for chatStore.conversations in StickyModeHeader
Optional chaining prevents TypeError when test mocks don't provide conversations array.
2026-07-02 21:48:41 +08:00
chiguyong b98e7cb42f test: update login test to expect standardized port 18001
The test was asserting port 8001 (old default) but config.py now loads .env.dev which sets AGENTKIT_SERVER_PORT=18001 per the project port standardization (18001/18002/15173/15174).
2026-07-02 21:30:21 +08:00
chiguyong 96f459c27d docs: add brainstorm/plan decision artifacts + plan progress update
Add ce-brainstorm requirements doc and ce-plan plan doc for private board restrictions and scheme B bubbles (decision artifacts). Update 2026-07-02-002 plan with U6/U7 progress table. Add .compound-engineering/config.local.example.yaml from ce-setup. gitignore tmp_*.html and delete_old_cluster.sh.
2026-07-02 21:27:20 +08:00
chiguyong 8188e8861d feat(ui): scheme B neutral grayscale for board messages + assistant bubbles
expertIdentity.ts PALETTE -> neutral grayscale; useMessageRenderer.ts removes assistant fallback for board_* events; BoardRoundCard/MessageShell apply GitHub-style gray; chatStream.ts prefers event-provided moderator avatar/color; StickyModeHeader/Scene4/LoginView/types aligned.
2026-07-02 21:26:22 +08:00
chiguyong 32746652aa fix(board): persist moderator avatar/color in round_summary events
board_orchestrator.py: include moderator_avatar and moderator_color in
the round_summary event payload so downstream consumers have the
moderator's identity metadata.

chat.py: persist expert_avatar and expert_color from the event data into
the board_summary message metadata, ensuring avatar/color survive page
reload instead of falling back to defaults.
2026-07-02 21:24:13 +08:00
chiguyong 484b7ddb95 fix(dev): isolate dev environment ports and fix env loading
- docker-compose.yaml: production mode uses expose (container-only) for
  Redis/PostgreSQL instead of ports (host-mapped)
- docker-compose.dev.yml: dev override maps Redis 6381 and PostgreSQL 5435
  to avoid conflicts with other projects (pms-redis 6379, geo_redis 6380,
  geo_db 5433)
- config.py: fix empty env var handling — only skip .env override when
  os.environ[key] is non-empty; load .env, .env.dev, .env.local in sequence
- scripts/dev-start.sh: manage agentkit-specific Docker containers
- .gitignore: add .env.dev and .env.local (contain API keys)
2026-07-02 21:23:50 +08:00
chiguyong 754d70623c refactor(experts): replace brand colors with neutral grayscale palette
Update color field in 15 expert YAML configs to use neutral grayscale
and deep accent tones (gray 400-800, stone, amber, dark blue/green),
consistent with the expertIdentity.ts PALETTE and the project convention
for GitHub-style neutral UI coloring.
2026-07-02 21:22:50 +08:00
chiguyong 9e2ccf5ac9 chore: gitignore .understand-anything (local knowledge graph index)
The .understand-anything/ directory is a tool-generated local index,
not project code. Remove 4 tracked files from index and add to .gitignore.
2026-07-02 21:22:00 +08:00
chiguyong 7376005868 fix: 修复 transient state 重置口径 + ReAct 工具调用规则
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
Bug 1: chatStore 三个 action 重置 boardState/debateState/collaborationState
- createConversation: 新增三态重置(原缺失,旧私董会状态泄漏到新会话)
- selectConversation: 统一为条件重置(prevConvId !== id),避免 force-reload 误清空
- deleteConversation: 补全 collaborationState 重置
- 附带:selectConversation 中 board_speech/board_summary 消息缺失
  expert_avatar/expert_color 时从 boardState.experts 兜底补全

Bug 2: ReAct _build_tool_use_prompt L0 规则调整
- 新增规则 1:涉及外部信息/实时数据/多步骤分析/不确定事实时必须使用工具
- 原规则 3 降为规则 4,收窄为仅在确实无需工具时可直接回答
- base_prompt 与工具描述不动(L1/L2 拆为独立 plan)

测试:5 前端 transient-state reset matrix + 6 后端 prompt rules 断言

Plan: docs/plans/2026-07-02-002-fix-transient-state-reset-and-react-tool-guidance-plan.md
2026-07-02 20:51:57 +08:00
chiguyong 78a7faa17b refactor: remove all emoji from agentkit
Deploy to Production / deploy (push) Waiting to run Details
Test / backend-test (push) Waiting to run Details
Test / frontend-unit (push) Waiting to run Details
Test / api-e2e (push) Waiting to run Details
Test / frontend-e2e (push) Waiting to run Details
Test / backend-test (pull_request) Has been cancelled Details
Test / frontend-unit (pull_request) Has been cancelled Details
Test / api-e2e (pull_request) Has been cancelled Details
Test / frontend-e2e (pull_request) Has been cancelled Details
Replace emoji across codebase: YAML avatars -> first char, frontend banners -> Ant Design Vue components, CLI status -> OK/FAIL/WARN labels, terminal -> [WARN]/[OK]/[PENDING], Bitable DB default -> table, App.vue font cleanup, test fixtures -> first char letters. shell.avatar type upgraded to string | Component.
2026-07-02 01:33:28 +08:00
chiguyong 36b0296730 fix: 私董会数据持久化修复 + emoji 移除计划
- 修复 board_started/expert_speech/round_summary/board_concluded 事件持久化
- 添加 is_board 标记到会话列表和详情接口
- 实现 restoreBoardStateFromMessages 从持久化消息恢复 boardState
- 添加 ChatSidebar 私董会徽章
- 添加 emoji 移除计划文档 (docs/plans/2026-07-02-001)
2026-07-02 01:07:12 +08:00