pinellolab/dictys

How to run Dictys from aggregated samples ?

JasonOSS opened this issue · 2 comments

Hello,

I was able to run the tutorial dataset and would now like to run Dictys on my own data.

I am working with a dataset of 16 samples that have been aggregated using CellRanger-arc.
I don't have a ATAC Position-sorted alignements (BAM) file after this step.

What would be the best solution to run Dictys with aggregated samples ?

Thanks

Hi JasonOSS,

Good to hear that! For aggregation using CellRanger-arc, do you mean cellranger-arc aggr? It simply aggregates the data together with optional read count normalization, right?

Assuming the read counts do not vary hugely between experiments, you do not need to normalize them for Dictys, although you may still want to do that for other analyses like low dimensional embedding and trajectory inference. Dictys uses pseudo-bulk of chromatin accessibility reads, which should be relatively stable.

To prepare the input, you can use the original files of each experiment, which should include bam files. You can run the helper script to split each bam file into one bam file per cell, and then rename them to match the cell names after aggregation.

For transcriptomic reads, you can use the data either before or after normalization (after similar renaming). We did not perform any normalization for the BMMC tutorial dataset, but this configuration might not generalize to all datasets. I would encourage you to try both options. As a starting point, I would not perform any other postprocessing of transcriptomic read counts, especially integration/imputation methods because they typically do not output read counts. Dictys can account for known confounders which can be your experimental batch or any confounder matrix extracted from an integration method (see below).

If you can suggest a good public dataset in this type (ideally: multi-omic and multi-sample, separated before integration, with files before aggregation and integration, and with low-dimensional coordinates after integration), we can check it out and produce a tutorial for this data type. We can demonstrate how to account for experimental batches with known transcriptomic confounders in Dictys, which is not shown in the current tutorials.

Thanks for your answer !

Yes I meant cellranger-arc aggr

I am going to try to simply rename the splitted bam files to match the barcodes post aggregation.

I will look in the literature for a good public dataset.

Thanks for your help