Using Dictys in mosaic data

Question

Using Dictys in mosaic data

ccruizm opened this issue 2 years ago · 10 comments

Good day!

First of all congrats on the paper! looking forward to try it on our dataset.

We have a dataset where we have integrated paired (truly multiome data) and unpaired (scRNA and scATAC individually). Have you ever used Dictys in this context?

Thanks in advance.

Answer 1 · 2023-08-04T20:53:26.000Z

Thank you, ccruizm!

We have not tested Dictys specifically in this context. However, in the paper we appied it on integrated data from unpaired scRNA-seq and scATAC-seq. Dictys is also able to account for batch effects on the transcriptome, such as from different technologies.

So my first question: is the unpaired data much larger than the paired one, or the other way round? If so, a simple solution is to omit the other dataset for Dictys. Otherwise, we can talk about how to account for batch effects.

Lingfei

Answer 2 · 2023-08-07T06:43:47.000Z

Thanks for your quick response @lingfeiwang!

In total, I have ~450K cells, of which ~340K have only RNA, ~68K are only ATAC, and ~55K are multiome (RNA+ATAC). If possible I would like to use the whole datatset. I have employed scGLUE to integrate my dataset as explained here (https://scglue.readthedocs.io/en/latest/paired.html). I have the cell embeddings after integration, so I have already corrected for batch effects :)

Do you think I can input my integrated object into Dictys? or is anything else I need beforehand?

Answer 3 · 2023-08-07T07:22:23.000Z

Thanks ccruizm for the helpful details.

It is great that you accounted for batch effects in the low dimensional co-embedding. However, Dictys uses raw reads and raw read counts to infer GRNs in the high dimensions, and therefore need to account for batch effects separately. We will work on exposing the transcriptome batch effect removal functionalities into the workflow and let you know once it's complete.

Meanwhile, you may wish to try running Dictys on the 340K cells for RNA and 68K+55K cells for ATAC for now, treating it as unpaired data. Given that you performed the co-embedding with all cells together, you are already using the whole dataset because the multiome data surely helps to find the co-embedding. Assuming the multiome data captured the same cell population, the extra 55K transcriptome should only offer a minor improvement on GRN quality at most. You can also familiarize yourself with the workflow and make your next run much smoother.

In any case, let me know if you have any questions.

Lingfei

Answer 4 · 2023-08-09T08:07:33.000Z

We tailored our approach by meticulously crafting a gene panel targeting 209 genes, drawing upon the cell phenotypes meticulously defined within our GBMap. This specialized panel empowers us to effectively identify cell states across a spectrum of patient samples, thereby encompassing the inherent intra and inter-tissue heterogeneity characterizing GB.

That's helpful! Looking forward to test the workflow that account for batch effects

Meanwhile, you may wish to try running Dictys on the 340K cells for RNA and 68K+55K cells for ATAC for now, treating it as unpaired data.

I will try this and see what the outcome is.

Thanks for your help Lingfei

Answer 5 · 2023-08-10T02:14:54.000Z

Great! Glad to help.

Answer 6 · 2023-08-18T14:05:17.000Z

Hi @ccruizm,

I have exposed the function to account for batch effects in transcriptome as covariates in this commit. You can first install Dictys 1.0.0 and then update with pip3 install --no-deps --force-reinstall git+https://github.com/pinellolab/dictys@a35fc8d in the conda environment. A simple tutorial is available as a part of https://nbviewer.org/github/pinellolab/dictys/blob/a35fc8da150c8b175b0d5072e5f4bb7a827ab0e5/doc/tutorials/full-skin/notebooks/main2.ipynb?flush_cache=true#Optional:-Prepare-covariates.

Could you try the tutorial and then your own dataset, and let us know any question you may have please?

Lingfei

Answer 7 · 2023-09-20T11:21:19.000Z

Stale issue message

Answer 8 · 2023-09-27T03:09:57.000Z

Hello!
First of all, congratulations on your paper! For some raw data that cannot be downloaded, we have the files required for scRNA seq, but for scATAC data, we can only obtain fragments files. Therefore, our question is whether fragments can replace bams files as input files for scATAC. This is very important for us!
Thank you first.

Answer 9 · 2023-09-27T03:37:50.000Z

Thank you for your interest! Unfortunately Dictys relies on wellington/pyDNase which does not accept fragment files as input. Most journals require raw data for publication so in theory you should be able to obtain the bam files or something equivalent. However, if you really cannot obtain bam files, there might be a last resort by converting fragment files to bams using uniform values for all reads for each field, such as read quality. Please note its highly experimental nature. We have not tried it and do not have a program for that.

Answer 10 · 2023-09-27T16:53:59.000Z

Thank you for your interest! Unfortunately Dictys relies on wellington/pyDNase which does not accept fragment files as input. Most journals require raw data for publication so in theory you should be able to obtain the bam files or something equivalent. However, if you really cannot obtain bam files, there might be a last resort by converting fragment files to bams using uniform values for all reads for each field, such as read quality. Please note its highly experimental nature. We have not tried it and do not have a program for that.

Hi lingfei, Thanks so much for your timely and very informative reply!
Your suggestion may be very helpful to us, I will try to realize it! The data we obtained were only from the GEO database, which may have some limitations for our current research. But dictys is a tool that really inspires our data analysis.
Many thanks!