scikit-learn-contrib/DESlib

The shape of test samples and neighbours are not the same

jayahm opened this issue · 5 comments

Hi,

Upon checking the shape of y_test in the examples/example_heterogeneous.py, it is (228,).

But, I checked the shape of neighbors (in ola.py), the shape is (66, 7). Since 7 the number of neighbours, I believe 66 is the number of test samples.

I wonder why this could be different? (228 vs 66)

But, in the end, the DS will return the labels of all samples that including those that have agreements between the base classifiers, am I right?

May I know what criteria a particular method uses for the "agreement"?

I believe it is based on the same label returned by each classifier.

In that case, there is a default classification decsion threshold for the classifiers.

What if in some classification problems, the threshold is varied? (that is in my case)

Just checking the prediction of the base models. If all models predict the same label, no need for further processing with DS as any selection will give the same output.

If at least one model disagrees with the other base classifiers, dynamic selection is used to select the most competent one(s) for prediction.

In the current implementation, there is no threshold for the degree of disagreement, since our goal was just to reduce the computational cost, but without changing the definitions at all of any of the dynamic selection methods. If you want to have something that changes according to the level of disagreement in the predicted labels (e.g., if 30% of the classifier disagrees with the rest) you would need to modify the code yourself in order to allow such functionality.