stekhoven/missForest

Application to repeated measurement data

Closed this issue · 2 comments

Thank you for developing this wonderful method called missForest.

I am considering applying missForest to proteome data from mass-spectrometry. This data was measured twice before and after treatment on 20 subjects. I understand that missForest requires observables that are pairwise independent. Can't such repeatedly measured omics data be applied to misssofrest?
I would be grateful if you could enlighten me.

Of course they can. What you want to do is to keep the repeated measurements (before and after treatment) in one observation (i.e. the same row of the data frame). However, be advised that if you have per protein only one measurement before and after for each of the 20 subjects, missForest will replace a missing value for such a protein by "looking what happens on average" in the other proteins, where we have a complete observation.

Thanks for the excellent advice. The proteome data contains a maximum of p proteins per sample. I was going to apply missForest to the 20 x 2 rows p columns matrix data since I am measuring the proteome data twice in total, before and after treatment per person. However, I will follow your advice and apply missForest to the 20 rows p x 2 columns matrix data.