HelenaLC/muscat

aggregateData does not return colData

Closed this issue · 2 comments

Hello,

I have a single cell sce object and i used aggregateData() to generate a pseudobulk matrix as follows:

pb <- muscat::aggregateData (sce,
                    assay = "counts", 
                    fun = "sum",
                    by = c("patientID")
 )

However, when i want to have a look at the "new" metadata/coldata for the pseudobulk matrix using SummarizedExperiment::colData(pb) i get this

DataFrame with 52 rows and 0 columns

The way it‘s implemented, the aggregation will only keep metadata that is unique to the variable(s) aggregated by. So this will happen when none are. Eg, say you aggregate by patients that each have 3 samples. Then there is no unique sample to assign to a given patient in the aggregated data. Hope that makes sense. — To see for yourself, you can aggregate by other variables that have unique mappings and check the resulting metadata.

FYI an option for you might be scuttle:: aggregateAcrossCells(), which has the coldata.merge parameter to enable you to specify how to merge the colData fields (because it's not always obvious how it should be done automatically).