samuel-marsh/scCustomize

Merge_Seurat_List() add.cell.ids for long Seurat object list

Closed this issue · 3 comments

Hi Marsh!

May I ask how would you add unique cell ids for a Seurat object list with a lot of Seurat objects inside (n=100)?

Manually writing add.cell.ids = c("A", "B", "C) as a parameter inside Merge_Seurat_List() is easy for small lists, but for large ones, it's troublesome.

I tried a for loop first, but it doesn't seem to work:

library(scCustomize)

### Add unique cell ids - dup;icate cell barcodes
for (i in names(ALL.list)) {
  ALL.list[[i]] <- RenameCells(ALL.list[[i]], 
                               add.cell.id = i) 
}

### Merge u
ALL <- Merge_Seurat_List(ALL.list)
Error in `Merge_Seurat_List()`:
! There are overlapping cell barcodes present in the input objects
ℹ Please rename cells or provide prefixes to `add.cell.ids` parameter to make unique.

Any recommendations are highly appreciated. Thank you!

Hi @levinhein,

So my best guess as to why the loop didn't work is because you have duplicate list names and so some cells still end up duplicated. If you just want a unique prefix and don't care what that prefix is you can easily create a vector of any length and then rename the cells using following code:

prefixes <- paste0("A", seq_len(length(x = ALL.list))

# rename cells
ALL.list <- lapply(1:length(ALL.list), function(x){
  ALL.list[[x]] <- RenameCells(object = ALL.list[[x]], add.cell.id = prefixes[x])
})

Does that resolve the issue when then trying to merge object list?

Best,
Sam

Hi Sam,

Thank you for this. It works to add unique cell barcodes on each dataset in the Seurat object list!

The only problem is that when doing Merge_Seurat_List(), my RStudio server keeps crashing. I've repeated it 6 times for the past 3 days and everything is unsuccessful. I also tried using the regular Serurat merge() function and still the same outcome.

Do you have any solution for merging huge data? MY RStduio memory usage indicator is red (110 GiB) while the Seurat object list in the environment is 150 GB when I do the merging procedure.

Merge ALL DATA either by:

A) ALL.merged <- Merge_Seurat_List(ALL.list)
B) ALL.merged <- merge(ALL.list[[1]], ALL.list[-1])

(PS: Would it matter if the arrangement/order of metadata columns are different per sample? Like all samples have the same metadata columns, just positioned differently (i.e. orig.ident is column 1 in first cohort, while it is column 5 in cohort 2? How about if the metadata are not even/the same)

image

Hi @levinhein,

So to answer your first question that unfortunately just seems like a memory issue. You could try moving to a higher memory environment (HPC, cloud, etc) if that is available to you. Alternatively you could explore Seurat/BPCells workflow and look into doing sketch based analysis.

In terms of meta data that should be fine and it should match things by column name I would think. You can always test just using two of the objects that have different column orders and marge them and make sure that columns are handled appropriately.

Best,
Sam