stekhoven/missForest

Ignore ID col in data

Closed this issue · 2 comments

Thanks for the great package!
How does the package handle input data that contains an ID column? The ID column is necessary, for example, in order to be able to merge the result data set with other data later, after imputing. Are the IDs included in the calculation/estimation of the missings? That would probably not be correct, since the IDs do not contain any information about the other features. Is it perhaps possible to pass the IDs as "ignore this column" as well? Or is the result set always in the same order as the input dataset - then you can remove the column before passing it to the function and just append it back to the result. Precondition for this: The row order must not change under any circumstances.
But maybe this is a comprehension problem on my part!
Thanks in advance in any case!

The ordering of the rows will not be affected by missForest, however, I do see the point of having an ignore this col feature. Meditate on this I will...

I have the same issue. It would nice to be able to include auxiliary variables (including ID) that are not used for imputation but that are crucial to include in the final dataset for merging with other data etc. Thanks for considering and for your development of this helpful package!