chiguyong
|
91a61f9b49
|
feat(evolution): auto-trigger + quality gate + actor marking (U6, R5/R6)
U6 of the complex task quality loop plan.
R5 (auto evolution trigger + quality gate):
- EvolutionConfig (Pydantic v2): success_sample_rate=0.1, min_confidence=0.5,
min_examples=3, observe_only=True, cross_workspace_sharing=False
- Success path gated by success_sample_rate; failure path always runs (100%)
- Observe-only mode records reflections without feeding optimizer (RV14:
avoids noise-driven prompt degradation during initial rollout)
- PromptOptimizer.can_optimize() consumption gate: sample count >= min_examples
AND mean quality >= min_confidence
- PitfallDetector confidence threshold: low-confidence warnings marked
observe-only; confidence = failure_rate * min(1.0, total/3) linear ramp
(ponytail: upgrade to Wilson interval)
R6 (actor marking + cross-workspace sharing):
- All evolution artifacts (EvolutionLogEntry, Module, PitfallWarning) carry
actor field; defaults to result.agent_name
- can_share_artifact(): same-workspace always allowed; cross-workspace requires
explicit opt-in via EvolutionConfig.cross_workspace_sharing=True
KTD-8: gave_up_after_reflections treated as failure path (triggers 100%
evolution) even when stream wrapper marks status as COMPLETED. Detection via
output_data.trace_outcome or error_message substring (ponytail: heuristic;
upgrade path is a dedicated TaskResult.trace_outcome field).
Backward compat: all gates conditional on auto_evolution_config is not None;
existing EvolutionMixin usage without config preserves prior behavior.
Tests: tests/unit/test_evolution_auto_trigger.py (37 tests) covers R5/R6
scenarios - sample rate gate, observe-only, consumption gate, pitfall
confidence, actor marking, cross-workspace sharing, gave_up_after_reflections,
error handling, fire-and-forget, backpressure cap, AE3 happy path.
|
2026-07-03 13:54:37 +08:00 |