opea-project/GenAIEval

Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety, and hallucination

PythonApache-2.0

Issues

Missing initial instructions in evals/benchmark/README (and others)
#92 opened a month ago by dbkinder
0
K8s Resource Management
#32 opened 3 months ago by kevinintel
3
Restructuring of genAI Eval to encompass multiple categories
#74 opened a month ago by Padma3a
1
Restructure genAI Eval to address evaluation of multiple categories of metrics
#75 opened a month ago by Padmaapparao
1
lm-as-judge acc on chinese
#72 opened a month ago by kevinintel
0
retrieval acc
#69 opened a month ago by kevinintel
0
E2E RAG acc on english dataset
#71 opened a month ago by kevinintel
0
reranking acc
#70 opened a month ago by kevinintel
0
Retrieval relevance
#67 opened a month ago by kevinintel
0
add RagAgent Benchmark
#56 opened a month ago by kevinintel
0
doc: cruft at top of /evals/metrics/bleu/README.md
#63 opened 2 months ago by dbkinder
0
Dynamic tuning on Resource management through K8s
#58 opened 2 months ago by kevinintel
0
Static tuning on Resource management for deployment
#57 opened 2 months ago by kevinintel
0
No matching distribution found for bigcode-eval (unavailable)
#53 opened 2 months ago by daisy-ycguo
1
Using DeepEval for RAGAS-related tasks
#31 opened 2 months ago by kevinintel
2
OpenShift
#43 opened 3 months ago by kevinintel
0
AutoRAG Step1
#29 opened 3 months ago by kevinintel
1
Auto DocSum for accuracy
#30 opened 3 months ago by kevinintel
1
RAG Eval
#28 opened 3 months ago by kevinintel
0