Fix analysis - variable selection not always necessarily performed

Question

Fix analysis - variable selection not always necessarily performed

Closed this issue 8 years ago · 4 comments

Variable selection is not mandatory, i.e. when the learning machine is an SVM.

How do we cope with this?

Answer 1 · 2017-01-27T14:33:51.000Z

I would suggest a check on the attributes of the estimator class.
For example ElasticNet-like classes, after the fit method, have a coef_ parameter.

The existence of that may be an indicator of a variable selection step; in that case the analysis may involve the plotting related to feature selection.

Answer 2 · 2017-01-28T13:54:52.000Z

Actually, I'm considering adding a flag in the session configuration file in order to enable/disable all things regarding features selection in the analysis.

I thinks it makes more sense for several reasons, i.e. since the existence of the coef_ parameter does not guarantee that it is an embedded method (see RLS or OLS). Also for the time being is more manageable for us.

In the future we must understand how to tweak it so that we can handle different types of FS algorithms

Answer 3 · 2017-01-28T15:06:43.000Z

@matteobarbieri: This sounds like a reasonable temporary workaround to me. If you want to perform variable selection when training an SVM I think that we should already be able to use algorithms like sklearn.feature_selection.RFE.

@fdtomasi: Unfortunately, checking the coef_ attribute cannot be a general rule. It makes sense only for a limited number of models. It would fail, for instance, when using ensemble methods such as Random Forests.

We definitely need to figure out a general strategy. I am thinking about something like using sklearn.feature_selection.SelectFromModel for Lasso and friends.

Answer 4 · 2017-03-03T14:48:06.000Z

Fix in #19.

Now variable selection must be explicited with the vs_analysis in the config.