- RagasEvaluator: LLM-as-Judge evaluation with ragas lib or built-in fallback - EvalDatasetBuilder: from traces or dict list - EvalMetrics: faithfulness, answer_relevancy, context_precision, context_recall - Built-in heuristic evaluation using keyword overlap and Jaccard similarity - 13 tests passing |
||
|---|---|---|
| .. | ||
| integration | ||
| unit | ||
| __init__.py | ||
| conftest.py | ||