fischer-agentkit

Commit Graph

Author	SHA1	Message	Date
chiguyong	91a61f9b49	feat(evolution): auto-trigger + quality gate + actor marking (U6, R5/R6) U6 of the complex task quality loop plan. R5 (auto evolution trigger + quality gate): - EvolutionConfig (Pydantic v2): success_sample_rate=0.1, min_confidence=0.5, min_examples=3, observe_only=True, cross_workspace_sharing=False - Success path gated by success_sample_rate; failure path always runs (100%) - Observe-only mode records reflections without feeding optimizer (RV14: avoids noise-driven prompt degradation during initial rollout) - PromptOptimizer.can_optimize() consumption gate: sample count >= min_examples AND mean quality >= min_confidence - PitfallDetector confidence threshold: low-confidence warnings marked observe-only; confidence = failure_rate * min(1.0, total/3) linear ramp (ponytail: upgrade to Wilson interval) R6 (actor marking + cross-workspace sharing): - All evolution artifacts (EvolutionLogEntry, Module, PitfallWarning) carry actor field; defaults to result.agent_name - can_share_artifact(): same-workspace always allowed; cross-workspace requires explicit opt-in via EvolutionConfig.cross_workspace_sharing=True KTD-8: gave_up_after_reflections treated as failure path (triggers 100% evolution) even when stream wrapper marks status as COMPLETED. Detection via output_data.trace_outcome or error_message substring (ponytail: heuristic; upgrade path is a dedicated TaskResult.trace_outcome field). Backward compat: all gates conditional on auto_evolution_config is not None; existing EvolutionMixin usage without config preserves prior behavior. Tests: tests/unit/test_evolution_auto_trigger.py (37 tests) covers R5/R6 scenarios - sample rate gate, observe-only, consumption gate, pitfall confidence, actor marking, cross-workspace sharing, gave_up_after_reflections, error handling, fire-and-forget, backpressure cap, AE3 happy path.	2026-07-03 13:54:37 +08:00

Author

SHA1

Message

Date

chiguyong

91a61f9b49

feat(evolution): auto-trigger + quality gate + actor marking (U6, R5/R6)

U6 of the complex task quality loop plan.

R5 (auto evolution trigger + quality gate):
- EvolutionConfig (Pydantic v2): success_sample_rate=0.1, min_confidence=0.5,
  min_examples=3, observe_only=True, cross_workspace_sharing=False
- Success path gated by success_sample_rate; failure path always runs (100%)
- Observe-only mode records reflections without feeding optimizer (RV14:
  avoids noise-driven prompt degradation during initial rollout)
- PromptOptimizer.can_optimize() consumption gate: sample count >= min_examples
  AND mean quality >= min_confidence
- PitfallDetector confidence threshold: low-confidence warnings marked
  observe-only; confidence = failure_rate * min(1.0, total/3) linear ramp
  (ponytail: upgrade to Wilson interval)

R6 (actor marking + cross-workspace sharing):
- All evolution artifacts (EvolutionLogEntry, Module, PitfallWarning) carry
  actor field; defaults to result.agent_name
- can_share_artifact(): same-workspace always allowed; cross-workspace requires
  explicit opt-in via EvolutionConfig.cross_workspace_sharing=True

KTD-8: gave_up_after_reflections treated as failure path (triggers 100%
evolution) even when stream wrapper marks status as COMPLETED. Detection via
output_data.trace_outcome or error_message substring (ponytail: heuristic;
upgrade path is a dedicated TaskResult.trace_outcome field).

Backward compat: all gates conditional on auto_evolution_config is not None;
existing EvolutionMixin usage without config preserves prior behavior.

Tests: tests/unit/test_evolution_auto_trigger.py (37 tests) covers R5/R6
scenarios - sample rate gate, observe-only, consumption gate, pitfall
confidence, actor marking, cross-workspace sharing, gave_up_after_reflections,
error handling, fire-and-forget, backpressure cap, AE3 happy path.

2026-07-03 13:54:37 +08:00

1 Commits