corankco
(COnsensus RANKing COmputation) is a Python package dedicated to the aggregation of incomplete rankings with ties. Users can choose the Kemeny-Young method (exact algorithm and several heuristics available), the Copeland method, and in some cases, the Borda method. The formalism to handle ties and/or incompleteness is detailed in the preprint: P. Andrieu, S. Cohen-Boulakia, M. Couceiro, A. Denise, A. Pierrot. A Unifying Rank Aggregation Model to Suitably and Efficiently Aggregate Any Kind of Rankings. Available at SSRN: https://ssrn.com/abstract=4353494.
As of our latest release, the API for corankco
is now stable and finalized.
Before installing corankco
, make sure your system meets the following requirements:
- Python >= 3.8
- For the usage of the exact algorithm on big datasets (with many elements to rank), the installation of IBM ILOG CPLEX Optimization Studio is recommended. A free academic version can be downloaded from the IBM website. After downloading, follow the instructions to install CPLEX.
README.md
To install
corankco
from PyPI, you can use pip:
pip3 install corankco
Visit our official documentation. For examples of use, you can jump directly to our [Usage section below]{#usage}.
For any queries or support related to corankco, feel free to reach us at pierre.andrieu@lilo.org.
We welcome contributions to corankco
. If you'd like to contribute, feel free to fork the repository and submit your changes via a pull request.
corankco
is licensed under the GPL-2.0 License. You can read more about it in the LICENSE file.
- Several algorithms have been sped up.
- BioConsert heuristic no longer requires the C extension. As a consequence, corankco is now available for any platform. Note that the computation time of BioConsert has been reduced due to major code improvements.
- Minor bug fix in BioConsert heuristic.
- Several tests added
from typing import List
import corankco as crc
# create a ranking from a list of sets
ranking1: crc.Ranking = crc.Ranking([{1}, {2, 3}])
# or from a string
ranking2: crc.Ranking = crc.Ranking.from_string("[{3, 1}, {4}]")
# also in this format
ranking3: crc.Ranking = crc.Ranking.from_string("[[1], [5], [3], [2]]")
# now, create a Dataset object. A Dataset is a list of rankings
dataset: crc.Dataset = crc.Dataset([ranking1, ranking2, ranking3])
# or, from raw rankings that is a list of list of sets of either ints,or strs
dataset2: crc.Dataset = crc.Dataset.from_raw_list([[{2, 1}, {4}], [{3, 1, 2}, {4}, {5}], [{1}, {2}, {3}, {4}]])
# or, create a Dataset object from a file where your rankings are stored
# format file: each line is a list of either set, or list of int / str.
dataset3: crc.Dataset = crc.Dataset.from_file(path="./dataset_examples/dataset_example")
# print information about the dataset
print(dataset.description())
# get all datasets in a folder
list_datasets: List[crc.Dataset] = crc.Dataset.get_datasets_from_folder(path_folder="./dataset_examples")
for dataset_folder in list_datasets:
print(dataset_folder.description())
# choose your scoring scheme
sc: crc.ScoringScheme = crc.ScoringScheme([[0., 1., 1., 0., 1., 1.], [1., 1., 0., 1., 1., 0.]])
print("scoring scheme : " + str(sc))
# scoring scheme description
print(sc.description())
print("\n### Consensus computation ###\n")
# list of rank aggregation algorithms to use among BioConsert, ParCons, ExactAlgorithm, KwikSortRandom,
# RepeatChoice, PickAPerm, MedRank, BordaCount, BioCo, CopelandMethod
algorithms_to_execute = [crc.ExactAlgorithm(optimize=False),
crc.KwikSortRandom(),
crc.BioConsert(starting_algorithms=None),
crc.BioConsert(starting_algorithms=[crc.CopelandMethod()]),
crc.ParCons(bound_for_exact=90, auxiliary_algorithm=crc.BioConsert()),
crc.CopelandMethod(),
crc.BioCo(),
crc.BordaCount(),
]
for alg in algorithms_to_execute:
print(alg.get_full_name())
consensus = alg.compute_consensus_rankings(dataset=dataset,
scoring_scheme=sc,
return_at_most_one_ranking=False)
# to get the consensus rankings : consensus.consensus_rankings
# description() will display supplementary information
print(consensus.description())
# if you want the consensus ranking only : print(consensus)
# get the Kemeny score associated with the consensus:
print(consensus.kemeny_score)
# compute a Kemeny score between a ranking and a list of rankings (dataset object):
ranking_test: crc.Ranking = crc.Ranking([{1, 2}, {4}, {3}])
dataset_test: crc.Dataset = crc.Dataset.from_raw_list([[{1}, {2}, {3}, {4}], [{1, 4}, {3}]])
scoring_scheme: crc.ScoringScheme = crc.ScoringScheme([[0., 1., 1., 0., 1., 0.], [1., 1., 0., 1., 1., 0.]])
kemeny_obj: crc.KemenyComputingFactory = crc.KemenyComputingFactory(scoring_scheme)
score: float = kemeny_obj.get_kemeny_score(ranking=ranking_test, dataset=dataset_test)
print("\nscore = ", score)
# Partitioning
# consistent with at least one optimal consensus
one_opt: crc.OrderedPartition = crc.OrderedPartition.parcons_partition(dataset_test, scoring_scheme)
print(one_opt)
# consistent with all the optimal consensus
all_opt: crc.OrderedPartition = crc.OrderedPartition.parfront_partition(dataset_test, scoring_scheme)
print(all_opt)
More detailed examples and use cases, please refer to our Jupyter Notebook.