This repository supplies a framework for implementing majority vote classifiers with performance guarantees. The implementation is used for experiments presented in [1,2,3,4]. When trained using bootstrapping or validation sets, theoretical guarantees based on PAC Bayesian theory will be computed, see [1,2,3,4,5].
The implementation is provided as a module, mvb
, which provides a python class MVBase
, which provides an interface for for implementing majority vote classifiers. mvb
also provides three such implementations:
- RandomForestClassifier
- ExtraTreesClassifier
- SVMVotersClassifier
- MultiClassifierEnsemble
Each provide a majority vote classifier with an interface similar to sklearn.ensemble.RandomForestClassifier
etc. The voters used in these implementations are based on various models from sklean: sklearn.tree.DecisionTreeClassifier
, sklearn.svm.SVC
, etc. [6].
Furthermore, the sub-module mvb.data
can be used for reading data, while functions for computing bounds directly can be found in sub-module mvb.bounds
.
Two directories with experiments are included in the repository:
- NeurIPS2022 provides the experiments of [1].
- NeurIPS2021 provides the experiments of [2].
- NeurIPS2020 provides the experiments of [3].
Each directory contains a README with a description of how to run the experiments of the given paper, including downloading of data from various sources [7,8,9].
Below follow a simple usage example of the mvb
library:
from mvb import RandomForestClassifier as RF
from mvb import data as mldata
X, Y = mldata.load('Letter:OQ')
rf = RF(n_estimators=100)
_ = rf.fit(X, Y)
bounds = rf.bounds()
Some of the implementation in mvb.bounds
is based on the implementation from [4].
[4] Lorenzen, Igel and Seldin: On PAC-Bayesian Bounds for Random Forests (ECML 2019)
[5] Germain, Lacasse, Laviolette, Marchand and Roy: Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm (JMLR 2015)
[6] The sklearn.ensemble module
[8] LibSVM
[9] Zalando Research