slipguru/palladio

Fix analysis - variable selection not always necessarily performed

Closed this issue · 4 comments

Variable selection is not mandatory, i.e. when the learning machine is an SVM.

How do we cope with this?

I would suggest a check on the attributes of the estimator class.
For example ElasticNet-like classes, after the fit method, have a coef_ parameter.

The existence of that may be an indicator of a variable selection step; in that case the analysis may involve the plotting related to feature selection.

Actually, I'm considering adding a flag in the session configuration file in order to enable/disable all things regarding features selection in the analysis.

I thinks it makes more sense for several reasons, i.e. since the existence of the coef_ parameter does not guarantee that it is an embedded method (see RLS or OLS). Also for the time being is more manageable for us.

In the future we must understand how to tweak it so that we can handle different types of FS algorithms

@matteobarbieri: This sounds like a reasonable temporary workaround to me. If you want to perform variable selection when training an SVM I think that we should already be able to use algorithms like sklearn.feature_selection.RFE.

@fdtomasi: Unfortunately, checking the coef_ attribute cannot be a general rule. It makes sense only for a limited number of models. It would fail, for instance, when using ensemble methods such as Random Forests.

We definitely need to figure out a general strategy. I am thinking about something like using sklearn.feature_selection.SelectFromModel for Lasso and friends.

Fix in #19.

Now variable selection must be explicited with the vs_analysis in the config.