Why MV has a longer computational time than DS methods?

Question

Why MV has a longer computational time than DS methods?

jayahm opened this issue 4 years ago · 4 comments

jayahm commented 4 years ago

Hi,

I was trying to understand the time taken for training and testing of DS methods.

So, I computed the computational time, which includes:

combined training time of all base classifiers (heterogeneous classifiers) - e.g. training time classifier 1 + classifier 2 and so on
training time for DS
test time

I compared with majority voting and computed the computational time as:

combined training time of all base classifiers (heterogeneous classifiers) - e.g. training time classifier 1 + classifier 2 and so on
training time for MV
test time

What I understood is DS should have longer computational time than MV.

Is this normal or I did something wrong?

Answer 1 · 2021-01-13T17:51:41.000Z

Hello, The training time for DS models is really fast and I could say negligible since it only consists of storing the input data and extracting some basic statistics about the data. The only DS technique that has a real training process is the META-DES which a meta-classifier is trained inside. You need to check whether training the MV (from scikit-learn) the method is re-training the classifiers that were already trained in the previous step. From what I remember scikit-learn majority voting does not consider a pre-trained set of classifiers as input. It always re-train the base models inside. Em ter., 12 de jan. de 2021 às 23:56, jayahm <notifications@github.com> escreveu:

…

Hi, I was trying to understand the time taken for training and testing of DS methods. So, I computed the computational time, which includes: 1. combined training time of all base classifiers (heterogeneous classifiers) - e.g. training time classifier 1 + classifier 2 and so on 2. training time for DS 3. test time I compared with majority voting and computed the computational time as: 1. combined training time of all base classifiers (heterogeneous classifiers) - e.g. training time classifier 1 + classifier 2 and so on 2. training time for MV 3. test time What I understood is DS should have longer computational time than MV. Is this normal or I did something wrong? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#241>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AD6SFZCKNZOGZ4IPSJ4S3PDSZURWXANCNFSM4WAHETTQ> .

Answer 2 · 2021-01-14T02:44:59.000Z

Thank you for that information.

I used the similar code in your example code:

model_perceptron = CalibratedClassifierCV(Perceptron(max_iter=100,
                                                     random_state=rng),
                                          cv=3)

model_perceptron.fit(X_train, y_train)
model_svc = SVC(probability=True, gamma='auto',
                random_state=rng).fit(X_train, y_train)
model_bayes = GaussianNB().fit(X_train, y_train)
model_tree = DecisionTreeClassifier(random_state=rng).fit(X_train, y_train)
model_knn = KNeighborsClassifier(n_neighbors=1).fit(X_train, y_train)

pool_classifiers = [model_perceptron,
                    model_svc,
                    model_bayes,
                    model_tree,
                    model_knn]

voting_classifiers = [("perceptron", model_perceptron),
                      ("svc", model_svc),
                      ("bayes", model_bayes),
                      ("tree", model_tree),
                      ("knn", model_knn)]

model_voting = VotingClassifier(estimators=voting_classifiers).fit(
    X_train, y_train)

I think the scikit learn MV will re-train the classifiers.

Answer 3 · 2021-01-15T17:30:36.000Z

Yes, unfortunately scikit-learn will re-train the base classifiers, you can see that in their documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.VotingClassifier.html

What you can do about it is to use the function majority_voting we have in deslib.utils.aggregation which can apply the majority voting combination rule receiving a list of pre-trained models

https://deslib.readthedocs.io/en/latest/modules/util/aggregation.html#

Answer 4 · 2021-01-18T04:32:21.000Z

Thank you for the information.

I also confirmed this by asking on StackOverflow:
https://stackoverflow.com/questions/65712738/does-the-training-of-majority-voting-in-scikit-learn-will-re-train-the-classifie/65734067#65734067