theislab/scvelo

Significant cell count reduction after merging loom files with scvelo.merge

mayurdoke6 opened this issue · 1 comments

I'm working on analyzing scRNA-seq data using scvelo. I'm trying to merge two loom files (New_Alloxan_possorted_genome_bam_CLF5P.loom and THR_possorted_genome_bam_SI7SF.loom) containing gene expression data for different cell populations.

Here's the relevant part of my code:

Python
import scanpy as sc
import scvelo as scv
import loompy

Load loom data
ldata1 = scv.read('New_Alloxan_possorted_genome_bam_CLF5P.loom', cache=True)
ldata2 = scv.read('THR_possorted_genome_bam_SI7SF.loom', cache=True)

Rename barcodes to ensure uniqueness
barcodes1 = [bc.split(':')[1] for bc in ldata1.obs.index.tolist()]
barcodes1 = [bc[0:len(bc)-1] + '_01' for bc in barcodes1]
ldata1.obs.index = barcodes1

barcodes2 = [bc.split(':')[1] for bc in ldata2.obs.index.tolist()]
barcodes2 = [bc[0:len(bc)-1] + '_02' for bc in barcodes2]
ldata2.obs.index = barcodes2

Make variable names unique
ldata1.var_names_make_unique()
ldata2.var_names_make_unique()

Concatenate ldata1 and ldata2
ldata = ldata1.concatenate(ldata2)

Align variables (features) between adata and ldata_combined
common_genes = adata.var_names.intersection(ldata.var_names)
adata = adata[:, common_genes]
ldata = ldata[:, common_genes]

Merge matrices
adata = scv.utils.merge(adata, ldata)

Print shapes to verify
print(adata.shape) # Output: (15515, 32247)
print(ldata.shape) # Output: (19310, 32247)
Use code with caution.
content_copy
Problem:

I expected the merged adata object to have around 19,000 cells (approximately the sum of cells in ldata1 and ldata2). However, after merging using scv.utils.merge, the number of cells in adata is significantly reduced to only 900.

Questions:

Is there a potential issue with how I'm aligning the barcodes between adata and ldata before merging?
Could there be another reason for the unexpected cell count reduction after merging?
How can I troubleshoot this issue to ensure all the cells from ldata2 are correctly included in the merged adata object?
Additional Information:

I've included the output showing the first few barcodes from adata, ldata1, and ldata2 after processing.

First few adata barcodes:
Index(['AAACGAACACGT', 'AAACGCTGTCCG', 'AAACGCTTCGTC', 'AAAGAACAGAGC',
'AAAGGGCCACGG'], dtype='object')
First few ldata1 barcodes:
Index(['AAAGAACAGAGCCATG_01', 'AATCACGGTTAACAGA_01', 'AACGAAAGTCTGCATA_01',
'AACAGGGCAGGATGAC_01', 'AATCGTGCAGCACAGA_01'], dtype='object')
First few ldata2 barcodes:
Index(['AAAGAACCAAGCTCTA_02', 'AAACCCACAGTGTACT_02', 'AAACCCACAACCGCCA_02',
'AAACGCTAGTGGTTCT_02', 'AAACGCTTCTGGCCGA_02'], dtype='object')
Any insights or suggestions on how to resolve this issue would be greatly appreciated!

Please use anndata's merge function and check the already existing issues and discussion on.