fischer-agentkit

Commit Graph

Author	SHA1	Message	Date
chiguyong	96f459c27d	docs: add brainstorm/plan decision artifacts + plan progress update Add ce-brainstorm requirements doc and ce-plan plan doc for private board restrictions and scheme B bubbles (decision artifacts). Update 2026-07-02-002 plan with U6/U7 progress table. Add .compound-engineering/config.local.example.yaml from ce-setup. gitignore tmp_*.html and delete_old_cluster.sh.	2026-07-02 21:27:20 +08:00
chiguyong	8188e8861d	feat(ui): scheme B neutral grayscale for board messages + assistant bubbles expertIdentity.ts PALETTE -> neutral grayscale; useMessageRenderer.ts removes assistant fallback for board_* events; BoardRoundCard/MessageShell apply GitHub-style gray; chatStream.ts prefers event-provided moderator avatar/color; StickyModeHeader/Scene4/LoginView/types aligned.	2026-07-02 21:26:22 +08:00
chiguyong	32746652aa	fix(board): persist moderator avatar/color in round_summary events board_orchestrator.py: include moderator_avatar and moderator_color in the round_summary event payload so downstream consumers have the moderator's identity metadata. chat.py: persist expert_avatar and expert_color from the event data into the board_summary message metadata, ensuring avatar/color survive page reload instead of falling back to defaults.	2026-07-02 21:24:13 +08:00
chiguyong	484b7ddb95	fix(dev): isolate dev environment ports and fix env loading - docker-compose.yaml: production mode uses expose (container-only) for Redis/PostgreSQL instead of ports (host-mapped) - docker-compose.dev.yml: dev override maps Redis 6381 and PostgreSQL 5435 to avoid conflicts with other projects (pms-redis 6379, geo_redis 6380, geo_db 5433) - config.py: fix empty env var handling — only skip .env override when os.environ[key] is non-empty; load .env, .env.dev, .env.local in sequence - scripts/dev-start.sh: manage agentkit-specific Docker containers - .gitignore: add .env.dev and .env.local (contain API keys)	2026-07-02 21:23:50 +08:00
chiguyong	754d70623c	refactor(experts): replace brand colors with neutral grayscale palette Update color field in 15 expert YAML configs to use neutral grayscale and deep accent tones (gray 400-800, stone, amber, dark blue/green), consistent with the expertIdentity.ts PALETTE and the project convention for GitHub-style neutral UI coloring.	2026-07-02 21:22:50 +08:00
chiguyong	9e2ccf5ac9	chore: gitignore .understand-anything (local knowledge graph index) The .understand-anything/ directory is a tool-generated local index, not project code. Remove 4 tracked files from index and add to .gitignore.	2026-07-02 21:22:00 +08:00
chiguyong	7376005868	fix: 修复 transient state 重置口径 + ReAct 工具调用规则 Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Bug 1: chatStore 三个 action 重置 boardState/debateState/collaborationState - createConversation: 新增三态重置（原缺失，旧私董会状态泄漏到新会话） - selectConversation: 统一为条件重置（prevConvId !== id），避免 force-reload 误清空 - deleteConversation: 补全 collaborationState 重置 - 附带：selectConversation 中 board_speech/board_summary 消息缺失 expert_avatar/expert_color 时从 boardState.experts 兜底补全 Bug 2: ReAct _build_tool_use_prompt L0 规则调整 - 新增规则 1：涉及外部信息/实时数据/多步骤分析/不确定事实时必须使用工具 - 原规则 3 降为规则 4，收窄为仅在确实无需工具时可直接回答 - base_prompt 与工具描述不动（L1/L2 拆为独立 plan）测试：5 前端 transient-state reset matrix + 6 后端 prompt rules 断言 Plan: docs/plans/2026-07-02-002-fix-transient-state-reset-and-react-tool-guidance-plan.md	2026-07-02 20:51:57 +08:00
chiguyong	78a7faa17b	refactor: remove all emoji from agentkit Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Replace emoji across codebase: YAML avatars -> first char, frontend banners -> Ant Design Vue components, CLI status -> OK/FAIL/WARN labels, terminal -> [WARN]/[OK]/[PENDING], Bitable DB default -> table, App.vue font cleanup, test fixtures -> first char letters. shell.avatar type upgraded to string \| Component.	2026-07-02 01:33:28 +08:00
chiguyong	36b0296730	fix: 私董会数据持久化修复 + emoji 移除计划 - 修复 board_started/expert_speech/round_summary/board_concluded 事件持久化 - 添加 is_board 标记到会话列表和详情接口 - 实现 restoreBoardStateFromMessages 从持久化消息恢复 boardState - 添加 ChatSidebar 私董会徽章 - 添加 emoji 移除计划文档 (docs/plans/2026-07-02-001)	2026-07-02 01:07:12 +08:00
Fischer	ba0baabfcd	Merge PR #15 : docs: compound streaming-event-contract-residuals learning Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details Docs-only change. ce-compound Full mode knowledge sedimentation for PR #14 residuals.	2026-07-01 13:53:46 +08:00
chiguyong	fe93b0f2a4	docs: compound streaming-event-contract-residuals learning Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Knowledge sedimentation for PR #14's 4 residual findings (1 P1 + 3 P2) from ce-code-review of feat/ui-ue-enhancement. ce-compound Full mode run. Created: - docs/solutions/integration-issues/streaming-event-contract-residuals.md Bug-track doc covering the 4-fix cluster: expert_step payload alignment, execute_stream CancellationToken registration, team_synthesis orphan milestone cleanup, synthesis_id dedup. Includes code examples, root cause analysis, and prevention strategies (streaming contract testing, cancellation registration checklist, terminal event symmetry, milestone identifier pattern). Updated: - AGENTS.md: WebSocket Chat 协议 section expanded with streaming event types (expert_step/expert_result_chunk/team_synthesis_chunk), synthesis_id dedup contract, and execute_stream cancellation contract. - CONCEPTS.md: Added "Streaming Milestone" entry to Expert Orchestration cluster — the UI pattern for streaming progress indicators that transition through streaming → completed\|error states, including orphan failure mode and synthesis_id matching semantics. Overlap with existing docs/solutions/runtime-errors/streaming-event-whitelist-and-accumulation.md is MODERATE (same area, different specific bugs). Flagged for potential consolidation via ce-compound-refresh.	2026-07-01 13:53:10 +08:00
Fischer	8e8843c363	Merge PR #14 : fix(experts): resolve residual review findings from PR #13 Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details Merges fix/ui-ue-residual-findings → main. Addresses 4 actionable findings (1 P1 + 3 P2) from ce-code-review of PR #13: - P1: expert_step payload alignment (_phase_executor.py) - P2 #1: execute_stream CancellationToken registration (config_driven.py) - P2 #2: team_synthesis orphan milestone cleanup (orchestrator.py) - P2 #3: synthesis_id dedup (orchestrator.py + types.ts + chatStream.ts) Verification: ruff clean, pytest 14/14, typecheck clean, vitest 126/127 (1 baseline)	2026-07-01 13:37:52 +08:00
chiguyong	47a437c5e3	fix(experts): resolve residual review findings from PR #13 Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Addresses 4 actionable findings (1 P1 + 3 P2) from ce-code-review of feat/ui-ue-enhancement (PR #13), now merged to main (`8066e0b`). P1 — expert_step payload alignment (_phase_executor.py) The thinking/tool_call/tool_result event payloads were missing the fields the frontend WsServerMessage contract requires (expert_name/expert_color/content/step). Frontend code consuming these events silently degraded. Now all expert_step broadcasts carry the full contract; tool_call/tool_result keep step_data for the raw payload. P2 #1 — execute_stream CancellationToken registration (config_driven.py) execute_stream() bypassed BaseAgent.execute() and never registered a CancellationToken, so cancel_task() could not cooperatively cancel a streaming task. Now registers the token and cleans it up in finally. P2 #2 — team_synthesis orphan milestone cleanup (orchestrator.py) If synthesis streaming was interrupted (cancel/exception), no terminal team_synthesis event was emitted, leaving the frontend streaming milestone spinning forever. Now an inner try/except emits a terminal team_synthesis with status=cancelled\|error before re-raising, so the frontend can finalize the milestone. The success path also carries the synthesis_id. P2 #3 — synthesis_id dedup (orchestrator.py + types.ts + chatStream.ts) Without an identifier, the frontend could not precisely match a team_synthesis terminal event to its streaming milestone (especially across retries/concurrent teams). The backend now injects a stable synthesis_id (`{plan.id}:synthesis`) into both team_synthesis_chunk and team_synthesis events; the frontend uses it for exact milestone matching and treats error/cancelled status as terminal. Test updates - Updated test_thinking_events_forwarded_as_expert_step to assert the new payload contract (expert_id/name/color/content/step). - Added test_tool_call_events_forwarded_as_expert_step covering tool_call/tool_result payload shape (content=tool_name摘要 + step_data=原始 payload). Verification - ruff check: clean - pytest tests/unit/experts/test_phase_executor_streaming.py: 14/14 - npm run typecheck: clean - vitest: 126/127 (1 unrelated baseline failure in tauri-auth.test.ts) Residuals doc: docs/residual-review-findings/feat-ui-ue-enhancement.md	2026-07-01 13:26:19 +08:00
Fischer	8066e0bf8b	Merge PR #13 : feat: UI/UE enhancement — streaming, sticky header, hover actions, calendar tokens Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details Implements U1-U7 UI/UE enhancement with streaming, sticky header, hover actions, calendar tokens. Review fixes applied (P0 whitelist + P0 double accumulation + P2 exception handling). See PR #13 description for details.	2026-07-01 13:15:33 +08:00
chiguyong	4866a16109	docs: compound streaming-event-whitelist-and-accumulation learning Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Captures the ReAct streaming contract bug + WS event whitelist governance from PR #13's review fixes. Three intertwined runtime issues documented: 1. P0: final_answer double-accumulated token content (logic_error) 2. P0: _VALID_TEAM_EVENT_TYPES whitelist missing 3 new streaming event types 3. P2: except (RuntimeError, TimeoutError, ConnectionError) too narrow for LLMProviderError/ConfigValidationError in async generator Adds ReAct Streaming Contract entry to CONCEPTS.md — defines the protocol execute_stream() yields (token events with incremental content, then one final_answer event with the concatenated full text). Consumers must pick one accumulation strategy, cannot mix both without doubled output.	2026-07-01 13:15:01 +08:00
chiguyong	f872a3fac6	feat: UI/UE enhancement — streaming, sticky header, hover actions, calendar tokens Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details U1 ThinkingBlock: streaming cursor + auto-collapse to summary bar U2 StickyModeHeader: new component replacing ExpertTeamView + BoardStatusView U3 Backend _phase_executor: execute_stream() with token/thinking/final_answer forwarding U4 Frontend chatStream: expert_result_chunk/team_synthesis_chunk token accumulation U5 AssistantText: routing tag hover fade-in U6 UserBubble: hover actions (copy/delete/refill) U7 CalendarGrid: token-based color redesign Review fixes (ce-code-review): - P0: _VALID_TEAM_EVENT_TYPES whitelist adds 3 new streaming event types - P0: final_answer no longer double-accumulates token content - P2: exception handling expanded to except Exception for LLMProviderError etc. Simplification (ce-simplify-code): - _synthesizer.py: O(n²) concat -> list+join, _concat_results extraction - config_driven.py: 4 duplicate _handle_*_stream -> _wrap_sync_as_stream - chatStream.ts: 5x [...messages].reverse().find() -> findLastMessage helper Tests: pytest 13/13, vitest 126/127 (1 baseline), typecheck pass, ruff clean	2026-07-01 12:51:45 +08:00
Ether	521f573d4a	docs: compound any-and-except-exception-governance convention (#12 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-07-01 08:16:54 +08:00
chiguyong	975b7c4e57	docs: compound any-and-except-exception-governance convention Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Record the strategies established during PR #8-#11 (1214+ tech debt governance) for Any replacement priority, except Exception classification, framework boundary preservation, and intentional-design retention.	2026-07-01 08:16:02 +08:00
Fischer	c005642851	refactor: tech debt Wave 3+4 (tools/skills/mcp/rag/calendar/auth/cli/quality/channels/telemetry/session/bus/documents Any 治理) (#11 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-07-01 08:08:36 +08:00
Fischer	a778f816c5	refactor: tech debt Wave 1+2 (except Exception 收尾 + core/experts Any 治理) (#10 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-07-01 03:54:53 +08:00
Fischer	838a05772e	refactor: follow-up tech debt cleanup (except Exception + Any 治理) (#9 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-07-01 03:03:02 +08:00
Fischer	cc531d0663	refactor: systematic tech debt cleanup (U1-U5) (#8 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details Merge PR #8: U1-U5 系统性技术债清理	2026-07-01 00:45:34 +08:00
chiguyong	ec9a0a1f70	refactor(frontend): split chat.ts (2025 lines) into chatStore/chatSocket/chatStream (U5) Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details chatStore.ts (498 lines, <=500 target met): Pinia store entry composing useChatSocket + useChatStream; retains all actions + backward-compat export aliases. chatSocket.ts (165 lines): resolveIncomingConvId pure fn + useChatSocket composable (connect/disconnect/heartbeat/reconnect). chatStream.ts (1557 lines): dispatchWsEvent pure fn for 30+ WS event types + useChatStream composable. Exceeds plan ~300 estimate due to discriminated union breadth (each case 30-50 lines); core testability goal met. 8 components + chat-phase.test.ts migrated from @/stores/chat to @/stores/chatStore. vitest: 35 new tests (chatStream 19 + chatSocket 13 + chat-phase 3) all green; typecheck passes.	2026-06-30 22:32:48 +08:00
chiguyong	1033346913	refactor(bitable,tools): replace Any with concrete types + Protocol (U4) BitableRecord/FormulaResult/SessionState TypeAlias replace dict[str, Any]; _redis/_engine/_session_factory typed as object \| None with TYPE_CHECKING Protocol (_RedisLike, _RecalcWorker); Coroutine[Any, Any, Any] retained as legitimate type param. Baseline 40 : Any occurrences -> 0 across 6 in-scope files (target <=5). Deferred: repository.py/recalc_worker.py/ingestion/* (10 occurrences, separate PR). ruff clean; 367 passed + 116 skipped (bitable + pipeline_state + tools).	2026-06-30 22:32:30 +08:00
chiguyong	be5c4e09f8	refactor(core,experts): classify except Exception + structured ReviewResult (U3) ReviewResult dataclass (passed/degraded/feedback) replaces tuple+[DEGRADED] prefix in _review_phase_output; 3 review_result WS payloads now carry degraded field (AE3). except Exception narrowed to specific types across 10 files (core/react, rewoo, base, orchestrator, dispatcher, plan_exec_engine + experts/orchestrator, _phase_executor, _review_gate + orchestrator/pipeline_engine). Baseline 140 -> 66 occurrences (>=50% reduction). Fix RuntimeError regression: review-gate + compression paths now catch RuntimeError (LLM/provider internal errors) to preserve degradation semantics. Test side_effect switched to functional form to avoid StopIteration on list exhaustion. ruff clean; 135 key + 469 experts + 163 core tests pass.	2026-06-30 18:03:58 +08:00
chiguyong	47ee2449df	refactor(experts): split TeamOrchestrator god class into 7 mixins (U2) - Split 2085-line orchestrator.py into main class (592 lines) + 7 responsibility-focused mixins: PhaseExecutor, DebateRunner, ReviewGate, DivergenceDetector, RollbackHandler, Synthesizer, InterventionHandler. - Mixin pattern preserves self access to shared state (_experts/_workspace/_broadcast_event); method bodies moved verbatim to minimize regression risk. Each mixin declares TYPE_CHECKING Protocol for shared state. - Split _execute_execution_phase (~290 lines) into _prepare_phase_context/_run_agent_steps/_finalize_phase (each <=100 lines). - All mixins <=400 lines, main class <=600 lines. [DEGRADED] prefix annotations preserved in ReviewGateMixin. - 60 team_orchestrator tests pass (behavior unchanged), 469 experts tests pass, ruff clean.	2026-06-30 16:47:20 +08:00
chiguyong	e61f98898f	refactor(core): unify ReActEngine execute/execute_stream via async generator (U1) - Convert _execute_loop to async generator yielding ReActEvent; both execute and execute_stream delegate to it, eliminating ~760 lines of duplicated loop logic (execute_stream 813 -> 53 lines). - Add 'final_result' event_type carrying ReActResult; execute extracts result from final event, execute_stream forwards events (backward-compatible 'final_answer' retained). - Unify _drain_phase_violations across both paths. - Add 14 golden-trajectory characterization tests. - Fix test_execute_stream_with_compressor mock gateway (chat_stream test-infra gap). 130 react tests pass, 762 core+experts pass, no regressions.	2026-06-30 16:07:00 +08:00
chiguyong	03b1e3d751	docs: add systematic tech debt cleanup plan (U1-U5)	2026-06-30 14:27:47 +08:00
chiguyong	a3cecd4b50	fix(review): apply P0/P2 findings from dual-agent review - Dockerfile: split ENTRYPOINT/CMD to align with docker-compose serve - test_termbase: guard jieba import with pytest.importorskip - orchestrator: mark silent review-degradation with [DEGRADED] prefix - chat.py: accurate ExecutionMode log message - agentkit.yaml: document OTel exporter config - skill_routing: replace 12 Any with object/typed (AGENTS.md compliance) - AssistantText.vue: add aria-live/role for a11y	2026-06-30 14:27:46 +08:00
Fischer	0962df11b5	feat(agent): Wave 4 PLAN_EXEC Hardening (U1-U5) (#7 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details Merge PR #7: Wave 4 PLAN_EXEC Hardening — U1-U5 + ce-code-review fixes + PLAN_EXEC concepts docs.	2026-06-30 12:46:35 +08:00
chiguyong	a872a459a6	docs: add PLAN_EXEC concepts + commit Wave 4 plan Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details CONCEPTS.md: new PLAN_EXEC section (Phase State Machine, PhasePolicy, Phase Violation, AdvancePhaseTool, _build_phase_engine). docs/plans/: commit the Wave 4 plan document (was untracked).	2026-06-30 12:46:24 +08:00
chiguyong	8627777f87	fix(review): apply ce-code-review findings Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Six safe fixes from Stage 5c review: phase.py: delete dead _DEFAULT_BASH_FILTER constant (no references after U1) chat.py: drop Any from _build_phase_engine params (AGENTS.md prohibits any) chat.ts: delete stale comment about phase_changed emission chat-phase.test.ts: rename misleading 'capped at 5' test name test_chat_plan_exec_ws.py: tighten test_rest_react_mode_still_works assertion test_plan_exec_e2e.py: clarify test_auto_advance assertion comment Known limitations documented in PR description (not fixed): loop detector + advance_phase (P1), parallel path phase_violation ordering (P2), REST cancellation_token (P2), Callable filter exceptions (P3).	2026-06-30 12:42:15 +08:00
chiguyong	cbbe937940	chore(shell): fix ruff F401/F841 + apply ruff format Pre-existing ruff errors surfaced during Wave 4 QC: - F401: drop unused `TerminalSession` import (only `TerminalSessionManager` is used) - F841: drop unused `start = time.monotonic()` local in `_execute_standalone` `ruff format` then reformatted a few long lines in the same file (frozenset literal, curl exfiltration regex, pipe operators, session.env call). No behavior change — formatting only. Why now: shell.py was already touched by U1 (widen `bash_command_filter`). Leaving known ruff failures in a file this PR modifies would make future CI gates noisy.	2026-06-30 11:52:51 +08:00
chiguyong	0a8f6eebef	feat(U5): E2E integration test for PLAN_EXEC lifecycle Add tests/integration/test_plan_exec_e2e.py covering the full PLAN_EXEC path through a scripted LLM mock (deterministic, no real API call). Mock boundary: LLMGateway.chat_stream yields scripted StreamChunk objects. Real ReActEngine, real PhasePolicy (default_policy()), real AdvancePhaseTool, real chat._handle_chat_message WS handler. Test scenarios (7 tests, all passing): - Happy path: PLANNING (search) → advance_phase → BUILDING (write_file) → advance_phase → VERIFICATION (shell ls tests/unit/) → advance_phase → DELIVERY (final answer). Asserts final_answer, tool dispatch counts, no phase_violation events, engine ends at DELIVERY. - Negative path: write_file in PLANNING blocked → phase_violation event emitted with violation_kind=tool_not_allowed → LLM calls advance_phase → write_file in BUILDING succeeds. Asserts exactly 1 violation, tool NOT dispatched during PLANNING (write_file.call_count==1 after recovery). - Edge cases: - auto_advance_after_steps=2: engine transitions out of PLANNING after 2 LLM calls without explicit advance_phase. - policy_from_config(enabled=False) returns None (PLAN_EXEC disabled). - policy_from_config({}) returns None (opt-out, fall back to default). - Error path: chat_stream raises RuntimeError → exception propagates, phase state unchanged (still PLANNING), tool not dispatched. - WS handler integration: full _handle_chat_message path emits both phase_violation (from engine) and phase_changed (from WS handler's transition detection) to the client WebSocket. Notes: - Loop detector threshold bumped to 99 for happy/negative/auto-advance tests (3 legitimate advance_phase calls with {} args would trigger the default threshold=2; this is a known PLAN_EXEC production concern tracked separately). - VERIFICATION-phase shell command uses `ls tests/unit/` instead of plan's `pytest tests/unit/ -q` — pytest is not in ShellTool._SAFE_COMMAND_PREFIXES and would be flagged dangerous by the default policy's bash filter. Using ls (whitelisted) keeps the test focused on lifecycle validation rather than policy tuning. Verification: python3 -m pytest tests/integration/test_plan_exec_e2e.py -v passes (7/7). Full regression: 116 tests pass across U1-U5 test files. Ruff check + format clean. Refs: R34, R27. Plan: docs/plans/2026-06-30-001-feat-agent-wave4-plan-exec-hardening-plan.md	2026-06-30 11:36:02 +08:00
chiguyong	2abe7c9e49	feat(U4): frontend phase_violation handling + PhaseIndicator component Extend the frontend to surface PLAN_EXEC phase lifecycle events to the user: - WsServerMessage union (types.ts) gains two branches: `phase_changed` and `phase_violation` (matching backend U2 emission). - chat.ts Pinia store gains a phase state slice: `currentPhase`, `phaseViolations` (capped at 5), `isPlanExec` computed, and `resetPlanExecState()`. - handleWsMessage adds `case "phase_changed"` (sets currentPhase + appends a milestone step) and `case "phase_violation"` (sets currentPhase from violation data, appends to violations, fires an ant-design-vue message.warning toast, appends an error step). - `result` handler calls `resetPlanExecState()` to clear the indicator when the conversation completes. - New `PhaseIndicator.vue` component: compact badge + 4 dots (PLANNING/BUILDING/VERIFICATION/DELIVERY) with the current phase highlighted + violation counter. Renders nothing when `!isPlanExec` (graceful degradation). - Mounted in `ChatView.vue` alongside ExpertTeamView and BoardStatusView. Tests: - New `tests/unit/stores/chat-phase.test.ts` verifies the phase state slice is exposed with correct initial values and `isPlanExec` derives from `currentPhase`. - `npm run typecheck` clean. - Pre-existing `tauri-auth.test.ts` failure is unrelated (fails in isolation on main).	2026-06-30 11:11:03 +08:00
chiguyong	b032e08866	feat(U3): extract _build_phase_engine helper + wire REST PLAN_EXEC Extract the WS path's inline phase_policy construction into a shared _build_phase_engine helper so the REST send_message endpoint can reuse it. Replace the former 501 stub with actual PLAN_EXEC execution: - REST POST /chat/sessions/{id}/messages with execution_mode=plan_exec now builds a phase-policy-backed ReActEngine, calls execute() (non-streaming), and returns a MessageResponse. - KTD5: PLAN_EXEC bypasses execute_with_fallback_chain — phase policy and fallback chain are mutually exclusive. - When plan_exec.enabled=False, REST falls through to the REACT path (matching WS behavior). - WS path refactored to call the same helper; behavior unchanged. Tests: - Replace TestRestPlanExec501 with TestRestPlanExec (happy path, bad config → 500, disabled → falls through to REACT, REACT mode unchanged). - Add TestBuildPhaseEngineHelper covering all return branches: not-PLAN_EXEC, disabled, empty-config, invalid-config, tool append, default-policy fallback. - All 109 tests pass across the three PLAN_EXEC test files.	2026-06-30 10:59:43 +08:00
chiguyong	4dc58c24bc	feat(U2): emit phase_violation WS event alongside LLM reinjection Wave 3 only injected the violation error dict back to the LLM as a tool result. Wave 4 U2 adds a parallel WS event so the frontend PhaseIndicator can surface violations to the user. - ReActEngine: add _phase_violations accumulator (list[dict]). Cleared in reset(). _check_phase_permission appends a structured violation dict (with new violation_kind field: tool_not_allowed \| bash_command_blocked) before returning the error. - Add _drain_phase_violations(step) helper that pops pending violations and returns ReActEvent(event_type="phase_violation", ...) list. Events carry a shallow copy of the violation dict so callers can't mutate the accumulator. - execute_stream: drain after each tool_result yield at all 3 tool execution sites (parallel, serial-with-confirmation, parsed_calls). Non-streaming execute() ignores the accumulator (the LLM reinjection via the error dict is the only signal there). - chat.py WS handler: new elif branch forwards phase_violation ReActEvents to the client as {"type": "phase_violation", "data": ...} WS messages. - Tests: 11 new tests covering accumulator lifecycle, drain semantics, shallow-copy isolation, and execute_stream event emission for both tool_block and bash_block paths. 2 new WS forwarding tests pin the chat.py path (forward + characterization for REACT mode).	2026-06-30 10:48:35 +08:00
chiguyong	9e28ab315e	feat(U1): widen PhasePolicy bash_command_filter to accept Callable Reuses ShellTool._is_dangerous as the default bash filter for PLANNING and VERIFICATION phases, closing the regex ceiling documented in Wave 3. - Convert ShellTool._is_dangerous and _is_single_command_dangerous to @staticmethod (backward-compatible; instance calls still work via Python's descriptor protocol). - Widen PhasePolicy.bash_command_filter field type to dict[PhaseState, Callable[[str], bool] \| re.Pattern \| None]. - is_bash_command_allowed dispatches on callable vs pattern at call time. Empty commands short-circuit to allowed (Wave 3 contract; ShellTool emits the clearer empty-command error). - to_dict serializes callables as <callable> for log readability. - default_policy() now wires ShellTool._is_dangerous for PLANNING and VERIFICATION. _DEFAULT_BASH_FILTER kept for backward compat with configs that pass a re.Pattern. - Tests: characterization tests pin Wave 3 behavior (rm/mv/cp/echo > still blocked) plus new edge-case coverage for ceiling closed (dd of=/dev/sda, :>file, chain operators, pipe segments).	2026-06-30 10:39:44 +08:00
Fischer	2b8a7d8909	feat(agent): Wave 3 strategic coupling (G5/G6) (#6 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-06-30 09:17:19 +08:00
Fischer	a2dcde01b8	feat(agent): Wave 2 medium coupling (G4/G7/G9) (#5 ) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-06-30 09:09:33 +08:00
Fischer	78ed93fc81	Merge pull request 'feat(agent): Wave 1 quick wins (G1/G2/G3/G8) + review fixes' (#4 ) from feat/agent-wave1-quick-wins into main Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-06-29 22:08:56 +08:00
chiguyong	d7ca6e8065	fix(review): W1 ServerConfig from_dict wiring, W3 internal kwargs filter, N3 status docstring Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details Code review fixes for Wave 1: - W1: ServerConfig.from_dict now wires prompt_cache/streaming/verification sections from YAML to constructor (previously these params existed but were never read) - W3: Tool._validate_input filters _-prefixed kwargs (e.g. _skip_dangerous_check) before jsonschema.validate, preventing additionalProperties:false schemas from rejecting internal control parameters - N3: ReActResult.status docstring now lists "empty_fallback" and "verify_failed" Added test test_internal_kwargs_underscore_prefixed_skipped_by_validation for W3.	2026-06-29 21:58:40 +08:00
chiguyong	cd211c6cd9	feat(U4): G1 verify 失败回灌 ReAct - ReActEngine 新增 max_reinjections 构造参数(默认 1,=0 等价原行为) - execute()/execute_stream() verify 块从循环后移到循环内 final-answer 检测点: - verify 通过 → 正常 break - verify 失败 + reinjections < max + step < max_steps → errors 作为 user 消息回灌 conversation, continue 让 LLM 自纠正 - verify 失败 + 达到 max_reinjections 或 max_steps → 记录 verify log 到 trajectory, trace_outcome="verify_failed", break - execute_stream 的 final_answer 事件在 verify 通过后才 yield,避免客户端过早收到完成信号 - ReActResult.status 现在传递 trace_outcome(原默认 "success") - ServerConfig.verification 配置项(max_reinjections) - test_verify_reinjection.py 10 测试:characterization(max=0)+ 新行为(R1/R2/R3/R14)	2026-06-29 21:35:08 +08:00
chiguyong	0f3f0a7550	feat(U3): G8 delta_flush_interval 调速 - ReActEngine 新增 flush_interval_ms 构造参数(默认 0 = 逐 chunk yield 向后兼容) - execute_stream chunk 循环用 time.monotonic 节流,累积 _flush_buffer 批量 yield - flush_interval_ms=0 条件短路为 True 逐 chunk yield 保当前行为 - 流结束 mid-interval 最终 flush 剩余 buffer 不丢字符 - ServerConfig.streaming 配置项(flush_interval_ms) - test_delta_flush.py 覆盖 R11/R12/R14	2026-06-29 20:49:52 +08:00
chiguyong	c4aaef05aa	feat(U2): G2 prompt cache 双块结构 - ReActEngine 新增 _build_system_message(stable+volatile) 双块构造 - Anthropic provider 返回 content blocks,stable 块带 cache_control - 非 Anthropic provider 返回字符串拼接,依赖 stable 前缀命中自动前缀缓存 - execute_stream/execute 记忆注入从 system_prompt 末尾移到 volatile 层 - LLMGateway.get_provider_name_for_model 暴露 provider 检测能力 - anthropic.py _convert_messages 支持 list-type system content 透传 - ServerConfig.prompt_cache 配置项(默认 enable=True) - ReActEngine.prompt_cache_enable 构造参数(默认 True 保当前行为) - test_prompt_cache_layers.py 覆盖 R4-R7/R13	2026-06-29 20:47:23 +08:00
chiguyong	c66a7773b5	feat(U1): G3 工具调用 schema 校验 - base.py 新增 ToolValidationError(error_code/details)与 _validate_input - safe_execute 在 execute 前用 jsonschema.validate 校验 kwargs - input_schema=None 跳过校验保持向后兼容 - _execute_tool 优先捕获 ToolValidationError 保留 error_code - function_tool._infer_schema 修复 VAR_KEYWORD/VAR_POSITIONAL 误入 schema - test_tool_schema_validation.py 覆盖 R8-R10	2026-06-29 20:34:14 +08:00
chiguyong	2747bb4e64	chore(prior): malformed tool call handling, auth whitelist, dev scripts, wave1 plan	2026-06-29 20:25:03 +08:00
Fischer	6e65352df8	Merge PR #3 : feat(bitable): 多维表格文件层 + 默认字段 + 表内字段操作 (Stage 1) Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details 合并 feat/bitable-ui-stage1 到 main — 多维表格 UI 完整性 Stage 1（U1-U6）+ ce-code-review P0/P1 修复	2026-06-29 09:25:30 +08:00
chiguyong	a6e1bf5884	feat(bitable): 多维表格文件层 + 默认字段 + 表内字段操作 + ce-code-review 修复 (Stage 1) Test / backend-test (pull_request) Has been cancelled Details Test / frontend-unit (pull_request) Has been cancelled Details Test / api-e2e (pull_request) Has been cancelled Details Test / frontend-e2e (pull_request) Has been cancelled Details 实现多维表格 UI 完整性 Stage 1（U1-U6），补齐飞书/twenty 对齐缺失的文件层、默认字段与表内字段操作能力，并修复 ce-code-review 走查发现的 P0/P1 级问题。后端（U1-U2）: - 新增 BitableFile 实体（models/db/repository/service/routes），三级层级：文件→数据表→字段/记录 - Schema V2 迁移：bitable_files 表 + tables.file_id 列，幂等（IF NOT EXISTS），保留 V1 孤儿表 - 新建数据表自动创建 5 个默认字段（标题/状态/日期/创建人/创建时间） - agent-owned 字段在 create_record 时自动填充（按 type+owner 匹配，传 actor_user_id） - 7 个文件 REST 端点 + IDOR ownership 检查（404-before-403，internal token 旁路）前端（U3-U5）: - 文件列表页（FileCard 网格 + 新建/重命名/删除）+ 文件详情页（侧栏表格列表 + vxe-table 网格） - Vue Router 嵌套路由 /bitable → /bitable/:fileId → /bitable/:fileId/:tableId - 列头菜单（编辑/隐藏/删除字段）+ 末尾 + 列新增字段 - select/multiselect 字段自定义单元格编辑器 + Tag 展示 - Pinia store 扩展 file 状态与动作，深链直访回退 getFile，fileId 切换 watch 测试（U6）: - 文件 CRUD（12 例）+ 默认字段（10 例）单元测试 - 3 个 E2E spec（视图加载、文件流、字段操作），后端不可用时优雅跳过 ce-code-review 修复（P0/P1）: - P0 路由冲突：GET /files/{file_id} 遮蔽下载端点 → 下载改 /uploads/{filename} - P0 IDOR：update/delete field/record/view 五端点补 ownership 检查 - P1 is_initialized property 缺失致二次初始化崩溃 - P1 直接 URL 导航失效（files 数组为空）→ selectFile 回退 getFile - P1 fileId 切换不重载 → 增加 watch - P1 轮询丢弃最终公式值（wasCalculating 守卫）+ 复用视图 filters - P1 测试断言 200→201；test_db 无 URL 用例解除 postgres 标记得以执行 - P2 _check_table_ownership 403→404；输入长度校验；upload field-table 一致性校验 - P2 multiselect 浅比较 → 深比较；E2E bitable-view 补 waitForServer 守卫验证：ruff check 通过；pytest 91 passed/116 skipped；vue-tsc --noEmit 通过。	2026-06-29 04:07:45 +08:00
chiguyong	f476d3339c	Merge branch 'test/calendar-ui-manual-testing' — 修复 agent 创建日历事件后 UI 不刷新 + 三根因文档三部曲 + E2E 测试套件 Deploy to Production / deploy (push) Waiting to run Details Test / backend-test (push) Waiting to run Details Test / frontend-unit (push) Waiting to run Details Test / api-e2e (push) Waiting to run Details Test / frontend-e2e (push) Waiting to run Details	2026-06-29 02:23:20 +08:00

1 2 3 4 5 ...

351 Commits All Branches Search

351 Commits

All Branches