bioFAM/mofax

`_load_sample_metadata` in `mofa_model` produces misaligned metadata

Closed this issue · 1 comments

After finding a couple of plotting features that work not as expected I resorted to write my own plotting functions. In the process of doing so I realised that the metadata I was relying on was completely misaligned with the samples. After reading through a couple of module I found that this most likely results from using pd.concat in _load_sample_metadata which does respect any alignment by default. Specifically, this is due to the way the mofa_model class is initialised, which uses the contents of model['samples'] to initialise the samples property and model['groups] to initialise the groups property. While model['samples'] contains samples ordered by their group assignment in ascending alphabetical order and is used by _load_sample_metadata to initialise the returned metadata frame, model['groups'] is not sorted. However, the metadata is loaded from model['samples_metadata'] by iterating over the groups property which produces a new data frame that does not align with the original frame generated from the samples property. Thus, merging those to by simply appending the columns of the frame generated from groups to those generated by samples (which is what pd.concat does) yields non-sensical metadata and in turn might produce wrong interpretations of the results of MEFISTO.

I would suggest to either use pd.join or pd.DataFrame.merge to ensure the metdata is aligned properly.

image

Hi @dmalzl, thanks a lot for the detailed report. @gtca can you take care of this?