chiguyong
|
83cdddd199
|
feat(evaluation): U9 Ragas evaluation pipeline for RAG quality assessment
- RagasEvaluator: LLM-as-Judge evaluation with ragas lib or built-in fallback
- EvalDatasetBuilder: from traces or dict list
- EvalMetrics: faithfulness, answer_relevancy, context_precision, context_recall
- Built-in heuristic evaluation using keyword overlap and Jaccard similarity
- 13 tests passing
|
2026-06-06 22:49:27 +08:00 |