scikit-learn-contrib/DESlib

Why MV has a longer computational time than DS methods?

jayahm opened this issue · 4 comments

Hi,

I was trying to understand the time taken for training and testing of DS methods.

So, I computed the computational time, which includes:

  1. combined training time of all base classifiers (heterogeneous classifiers) - e.g. training time classifier 1 + classifier 2 and so on
  2. training time for DS
  3. test time

I compared with majority voting and computed the computational time as:

  1. combined training time of all base classifiers (heterogeneous classifiers) - e.g. training time classifier 1 + classifier 2 and so on
  2. training time for MV
  3. test time

What I understood is DS should have longer computational time than MV.

Is this normal or I did something wrong?

Thank you for that information.

I used the similar code in your example code:

model_perceptron = CalibratedClassifierCV(Perceptron(max_iter=100,
                                                     random_state=rng),
                                          cv=3)

model_perceptron.fit(X_train, y_train)
model_svc = SVC(probability=True, gamma='auto',
                random_state=rng).fit(X_train, y_train)
model_bayes = GaussianNB().fit(X_train, y_train)
model_tree = DecisionTreeClassifier(random_state=rng).fit(X_train, y_train)
model_knn = KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train)

pool_classifiers = [model_perceptron,
                    model_svc,
                    model_bayes,
                    model_tree,
                    model_knn]

voting_classifiers = [("perceptron", model_perceptron),
                      ("svc", model_svc),
                      ("bayes", model_bayes),
                      ("tree", model_tree),
                      ("knn", model_knn)]

model_voting = VotingClassifier(estimators=voting_classifiers).fit(
    X_train, y_train)

I think the scikit learn MV will re-train the classifiers.

Yes, unfortunately scikit-learn will re-train the base classifiers, you can see that in their documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html

What you can do about it is to use the function majority_voting we have in deslib.utils.aggregation which can apply the majority voting combination rule receiving a list of pre-trained models

https://deslib.readthedocs.io/en/latest/modules/util/aggregation.html#