Test / backend-test (pull_request) Has been cancelledDetails
Test / frontend-unit (pull_request) Has been cancelledDetails
Test / api-e2e (pull_request) Has been cancelledDetails
Test / frontend-e2e (pull_request) Has been cancelledDetails
Six safe fixes from Stage 5c review:
phase.py: delete dead _DEFAULT_BASH_FILTER constant (no references after U1)
chat.py: drop Any from _build_phase_engine params (AGENTS.md prohibits any)
chat.ts: delete stale comment about phase_changed emission
chat-phase.test.ts: rename misleading 'capped at 5' test name
test_chat_plan_exec_ws.py: tighten test_rest_react_mode_still_works assertion
test_plan_exec_e2e.py: clarify test_auto_advance assertion comment
Known limitations documented in PR description (not fixed): loop detector + advance_phase (P1), parallel path phase_violation ordering (P2), REST cancellation_token (P2), Callable filter exceptions (P3).
Add tests/integration/test_plan_exec_e2e.py covering the full PLAN_EXEC
path through a scripted LLM mock (deterministic, no real API call).
Mock boundary: LLMGateway.chat_stream yields scripted StreamChunk
objects. Real ReActEngine, real PhasePolicy (default_policy()), real
AdvancePhaseTool, real chat._handle_chat_message WS handler.
Test scenarios (7 tests, all passing):
- Happy path: PLANNING (search) → advance_phase → BUILDING (write_file)
→ advance_phase → VERIFICATION (shell ls tests/unit/) → advance_phase
→ DELIVERY (final answer). Asserts final_answer, tool dispatch counts,
no phase_violation events, engine ends at DELIVERY.
- Negative path: write_file in PLANNING blocked → phase_violation event
emitted with violation_kind=tool_not_allowed → LLM calls advance_phase
→ write_file in BUILDING succeeds. Asserts exactly 1 violation, tool
NOT dispatched during PLANNING (write_file.call_count==1 after recovery).
- Edge cases:
- auto_advance_after_steps=2: engine transitions out of PLANNING
after 2 LLM calls without explicit advance_phase.
- policy_from_config(enabled=False) returns None (PLAN_EXEC disabled).
- policy_from_config({}) returns None (opt-out, fall back to default).
- Error path: chat_stream raises RuntimeError → exception propagates,
phase state unchanged (still PLANNING), tool not dispatched.
- WS handler integration: full _handle_chat_message path emits both
phase_violation (from engine) and phase_changed (from WS handler's
transition detection) to the client WebSocket.
Notes:
- Loop detector threshold bumped to 99 for happy/negative/auto-advance
tests (3 legitimate advance_phase calls with {} args would trigger
the default threshold=2; this is a known PLAN_EXEC production concern
tracked separately).
- VERIFICATION-phase shell command uses `ls tests/unit/` instead of
plan's `pytest tests/unit/ -q` — pytest is not in
ShellTool._SAFE_COMMAND_PREFIXES and would be flagged dangerous by
the default policy's bash filter. Using ls (whitelisted) keeps the
test focused on lifecycle validation rather than policy tuning.
Verification: python3 -m pytest tests/integration/test_plan_exec_e2e.py -v
passes (7/7). Full regression: 116 tests pass across U1-U5 test files.
Ruff check + format clean.
Refs: R34, R27. Plan: docs/plans/2026-06-30-001-feat-agent-wave4-plan-exec-hardening-plan.md
The whoami route accepted rotated/old refresh tokens for cold-start
because it only checked session revocation status, not the token hash.
Now when token_type == "refresh", the route computes hash_token(token)
and compares it with the session's stored refresh_token_hash using
hmac.compare_digest (constant-time). Mismatch returns 401.
- Add SessionService.get_stored_refresh_hash(session_id) helper
- Add hash verification in whoami route (R9)
- Add TestWhoamiTokenHash with 5 integration tests
SkillService: enable/disable (persisted in skill_states table, schema
v4), import from YAML (with path traversal + name validation), reload
from file, update config. GET /skills now filters disabled skills.
KbService: list/upload/delete documents with department_id binding.
Added department_id field to KnowledgeSource + UploadedDocument.
Department visibility: (bound to user depts) ∪ (global = None).
10 new admin endpoints: skill enable/disable/import/reload/update,
KB documents CRUD, source sync/rebuild. All guarded by _require_admin.
Implemented reload stub in skill_management.py (was no-op).
54 new tests (26 unit + 28 integration). Fixed 4 pre-existing lint
errors. 357 admin tests pass, no regressions.
U1: Bump _SCHEMA_VERSION to 3, add 5 department tables (departments,
user_departments, department_skill_bindings, department_kb_bindings,
department_quotas) + 5 ORM models + helpers.
U2: DepartmentService (12 async methods: CRUD + bind/unbind skill/KB +
count_users). Mount admin_router in app.py. 36 unit + 28 integration tests.
U4: DepartmentContext FastAPI dependency (per-route, admin bypasses
filtering). filter_skills_by_department / filter_kb_sources_by_department
helpers. Applied to GET /skills and GET /kb-management/* routes.
15 integration tests for department isolation.
Also includes brainstorm + plan docs. 108 new tests, all pass.
Add create_user method to LocalAuthProvider (bcrypt hash + INSERT,
raises ValueError on duplicate username/email).
Add UserService with 9 async methods: create/list/get/update/delete
(soft)/reset_password/assign_department/remove_department/list_user_departments. reset_password revokes all sessions via SessionService.
delete_user is soft (is_active=0, row preserved).
Add 9 user endpoints to routes/admin.py: POST/GET/PATCH/DELETE users,
reset-password, assign/remove department, list departments. All
guarded by _require_admin.
Tests: 40 unit + 37 integration = 77 new tests. Full admin suite
170 tests pass, no regressions.
1. InMemoryMessageBus.request(): fix param name (timeout→timeout_seconds) to match ABC
2. InMemoryMessageBus: track consumer tasks, cancel on unsubscribe
3. InMemoryMessageBus: _try_resolve_pending() in queue consumer path
4. evolve_soul(): use "default" category when patterns is empty
5. quick_classify(): use delimiter-based prompt to mitigate injection risk
6. Use asyncio.get_running_loop() instead of deprecated get_event_loop()
Phase B:
- U1: CostAwareRouter with 3-layer routing (rule/LLM/capability matching)
- U6: OrganizationContext with agent profiles and capability-based discovery
- U7: AlignmentGuard with constraint injection and cascade detection
Phase C:
- U8: Soul dynamic evolution with version tracking and reflection-triggered updates
- U9: Auction mechanism as optional advanced routing mode
- U10: Server integration + end-to-end integration tests
250 new tests passing across all units.