mlr-org/mlr3filters

Filters and mlr3pipelines and filters before building graph

Closed this issue · 1 comments

Hi,

It is dfficult to find examples on how tu use mlr3filters with mlr3pipelines. That is, how to incorporate features filtering with other preprocessing and modelling steps.

I have several doubts:

  1. If I use mlr3 filters first and than build a graph with preprocessing elements and learners, is this cheating because I use whole dataset to extract features and than use that same features in my nested CV later?
  2. If answer to 1. is yes, (it is cheating), what can be the right apprach to filter (select) most important features? Shoould it be part of the graph ?
  3. Do you have any refference o the perfromance on selection and filters methods? Are feature selection through models much better than filters?
pat-s commented

Hi

  1. Filters should be used within the model optimization step in a pipeline, on the same level as hyperparameter tuning.
  2. Yes, see (1)
  3. There are numerous scientific articles out there which compare both. Though keep in mind that filters are way faster and save you a full optimization layer as they can be integrated into the hyperparameter tuning whereas wrapper methods need to be run standalone. With respect to integrated FS methods of algorithms, this is again a different category. If you want to make use of them, you should not combine them with wrapper or filter methods as you would essentially perform variable selection twice.