Integrated datasets?
Opened this issue · 5 comments
Hello,
I am very interested to use this tool in my work! I wondered if we had four different biological samples for 10X Multiome (which I've integrated using scRNA and Seurat's integration pipeline), whether SCARlink could handle this?
Or if instead the authors recommend running SCARlink on individual (un-integrated) samples.
Any insights would be greatly appreciated!
We tried both of the approaches on a data set with multiple samples and found the predictions to be similar. There were a few genes that were not included in some samples due to sparsity issues. The threshold for calling gene-linked tiles might also need to be different.
Thanks @snehamitra! If I might ask one additional question - is there any learning when running SCARlink on a Multiome dataset? I know in the paper its shown that SCARlink predicts gene activity using multiome ATAC more accurately than existing methods. Basically, if I wanted these more accurate gene-activity predictions for a stand-alone scATAC dataset (from the same tissue), can I leverage my Multiome data to improve these predictions?
That's an interesting idea. We haven't tried it. You could train the model on the multi-ome and then use the trained model to impute gene expression on your standalone scATAC-seq data. If the data is from matched tissue, then the predictions might be comparable. You could compare the prediction trends grouped by cell type. For example, is the imputed expression of a certain gene higher in cell type A compared to cell type B in both multi-ome and standalone scATAC-seq.
Hi @snehamitra
Thanks for the advice! By any chance, might you suggest how I could approach this using the scarlink scripts? Or is it possible to extract the trained model from the output directory of the multi-ome and use this somehow with a stand-alone ArchR ATAC object?
Thanks,
Daniel
Hi @danieljrichard
I already asked @snehamitra but perhaps you can help me. I have the same concern as you have : an integrated object of four samples for the 10X multiome. I analyzed the GEX on Seurat and the ATAC on Signac first), but I had to reanalyzed the ATAC using ArchR to use SCARlink. However, I don't know how to transfert the celltype information in the ArchR object properly. I'm really a novice in bioinfomatic analysis and at lost with ArchR
I tried :
projPG <- addGeneIntegrationMatrix(
ArchRProj = projPG,
useMatrix = "GeneScoreMatrix",
matrixName = "GeneIntegrationMatrix",
reducedDims = "IterativeLSI",
seRNA = integrated,
addToArrow = FALSE,
groupRNA = "celltype",
nameCell = "predictedCell_Un",
nameGroup = "celltype",
nameScore = "predictedScore_Un"
)
But in the end, the info is not passed on SCARlink (celltype is full of unrealted number and the plot have no cell types)
If you have any suggestion, it would be great
Best
David