opea-project/GenAIEval
Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination
PythonApache-2.0
Issues
- 0
- 3
K8s Resource Management
#32 opened by kevinintel - 1
- 1
Restructure genAI Eval to address evaluation of multiple categories of metrics
#75 opened by Padmaapparao - 0
lm-as-judge acc on chinese
#72 opened by kevinintel - 0
retrieval acc
#69 opened by kevinintel - 0
E2E RAG acc on english dataset
#71 opened by kevinintel - 0
reranking acc
#70 opened by kevinintel - 0
Retrieval relevance
#67 opened by kevinintel - 0
add RagAgent Benchmark
#56 opened by kevinintel - 0
doc: cruft at top of /evals/metrics/bleu/README.md
#63 opened by dbkinder - 0
- 0
- 1
- 2
Using DeepEval for RAGAS-related tasks
#31 opened by kevinintel - 0
OpenShift
#43 opened by kevinintel - 1
AutoRAG Step1
#29 opened by kevinintel - 1
Auto DocSum for accuracy
#30 opened by kevinintel - 0
RAG Eval
#28 opened by kevinintel