- [June 29, 2022] Added support for Tukey's HSD Test.
- [June 28, 2022] Added support for Bpref and Rank-biased Precision (RBP) metrics.
- [June 9, 2022] Added support for 25 fusion algorithms, six normalization strategies, and an automatic fusion optimization functionality in
v.0.2
.
Check out the official documentation and Jupyter Notebook for further details on fusion and normalization. - [May 18, 2022] Added support for loading qrels from ir-datasets in
v.0.1.13
.
Usage example:Qrels.from_ir_datasets("msmarco-document/dev")
for MS MARCO document retrieval dev set. - [May 4, 2022] Added Paired Student's t-Test in
v.0.1.12
.
ranx is a library of fast ranking evaluation metrics implemented in Python, leveraging Numba for high-speed vector operations and automatic parallelization. It offers a user-friendly interface to evaluate and compare Information Retrieval and Recommender Systems. ranx allows you to perform statistical tests and export LaTeX tables for your scientific publications. Moreover, ranx provides several fusion algorithms and normalization strategies, and an automatic fusion optimization functionality. ranx was featured in ECIR 2022, the 44th European Conference on Information Retrieval.
If you use ranx to evaluate results or conducting experiments involving fusion for your scientific publication, please consider citing it.
For a quick overview, follow the Usage section.
For a in-depth overview, follow the Examples section.
- Hits
- Hit Rate
- Precision
- Recall
- F1
- r-Precision
- Bpref
- Rank-biased Precision (RBP)
- Mean Reciprocal Rank (MRR)
- Mean Average Precision (MAP)
- Normalized Discounted Cumulative Gain (NDCG)
The metrics have been tested against TREC Eval for correctness.
Please, refer to Smucker et al., Carterette, and Fuhr for additional information on statistical tests for Information Retrieval.
You can load qrels from ir-datasets as simply as:
qrels = Qrels.from_ir_datasets("msmarco-document/dev")
A full list of the available qrels is provided here.
Name | Name | Name | Name | Name |
---|---|---|---|---|
CombMIN | CombMNZ | RRF | MAPFuse | BordaFuse |
CombMED | CombGMNZ | RBC | PosFuse | Weighted BordaFuse |
CombANZ | ISR | WMNZ | ProbFuse | Condorcet |
CombMAX | Log_ISR | Mixed | SegFuse | Weighted Condorcet |
CombSUM | LogN_ISR | BayesFuse | SlideFuse | Wighted Sum |
Please, refer to the documentation for further details.
Please, refer to the documentation for further details.
pip install ranx
from ranx import Qrels, Run
qrels_dict = { "q_1": { "d_12": 5, "d_25": 3 },
"q_2": { "d_11": 6, "d_22": 1 } }
run_dict = { "q_1": { "d_12": 0.9, "d_23": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_32": 0.5, "d_35": 0.4 },
"q_2": { "d_12": 0.9, "d_11": 0.8, "d_25": 0.7,
"d_36": 0.6, "d_22": 0.5, "d_35": 0.4 } }
qrels = Qrels(qrels_dict)
run = Run(run_dict)
from ranx import evaluate
# Compute score for a single metric
evaluate(qrels, run, "ndcg@5")
>>> 0.7861
# Compute scores for multiple metrics at once
evaluate(qrels, run, ["map@5", "mrr"])
>>> {"map@5": 0.6416, "mrr": 0.75}
from ranx import compare
# Compare different runs and perform statistical tests
report = compare(
qrels=qrels,
runs=[run_1, run_2, run_3, run_4, run_5],
metrics=["map@100", "mrr@100", "ndcg@10"],
max_p=0.01 # P-value threshold
)
Output:
print(report)
# Model MAP@100 MRR@100 NDCG@10
--- ------- -------- -------- ---------
a model_1 0.320įµ 0.320įµ 0.368įµį¶
b model_2 0.233 0.234 0.239
c model_3 0.308įµ 0.309įµ 0.330įµ
d model_4 0.366įµįµį¶ 0.367įµįµį¶ 0.408įµįµį¶
e model_5 0.405įµįµį¶įµ 0.406įµįµį¶įµ 0.451įµįµį¶įµ
from ranx import fuse, optimize_fusion
best_params = optimize_fusion(
qrels=train_qrels,
runs=[train_run_1, train_run_2, train_run_3],
norm="min-max", # The norm. to apply before fusion
method="wsum", # The fusion algorithm to use (Weighted Sum)
metric="ndcg@100", # The metric to maximize
)
combined_test_run = fuse(
runs=[test_run_1, test_run_2, test_run_3],
norm="min-max",
method="wsum",
params=best_params,
)
Name | Link |
---|---|
Overview | |
Qrels and Run | |
Evaluation | |
Comparison and Report | |
Fusion |
Browse the documentation for more details and examples.
If you use ranx to evaluate results for your scientific publication, please consider citing it:
@inproceedings{bassani2022ranx,
author = {Elias Bassani},
title = {ranx: {A} Blazing-Fast Python Library for Ranking Evaluation and Comparison},
booktitle = {{ECIR} {(2)}},
series = {Lecture Notes in Computer Science},
volume = {13186},
pages = {259--264},
publisher = {Springer},
year = {2022}
}
Would you like to see other features implemented? Please, open a feature request.
Would you like to contribute? Please, drop me an e-mail.
ranx is an open-sourced software licensed under the MIT license.