pinellolab/dictys

Preparation of input data

Closed this issue · 2 comments

Hello,

First of all, thank you for the awesome analysis tool! I am trying to run Dictys with my 10x multiome data.

  1. When preparing input files with multiple samples, how do I combine multiple matrix and bam files from each samples.?
  2. How should I prepare the input data when I want to analyze data from only some clusters and not the whole data?

Thank you so much in advance,
Yoon

Hi Yoon,

Thank you for your interest. There are several points of concern when using multiple samples.

  1. Please check whether samples display unintended separation in the low dimensions. If so, you may want to integrate them properly before either cell clustering or trajectory inference. These discrete or continuous groups will be needed for downstream GRN inference. Besides, I also suggest to try both including these samples as covariates and not including them, and use the inferred GRN that contains better biology for analysis. See the bottom of #25 for a notebook tutorial on how.
  2. Please make sure each cell has a unique name across samples in the read count matrix and in the bam file. For read count matrices, you can append sample names to cell names before merging these matrices. For bam files, you can split each of them by cells using the script provided by Dictys in a separate folder for each sample, append sample names to the file names, and then move all the bam files into a single folder.

If you are interested in cell type specific GRNs for certain clusters, you can edit data/subsets.txt to only include the clusters of interest. You do not need to include data for cells in other cluster either.

Let me know if you have any followup questions.

Lingfei

Hi Lingfei,

Thanks so much for your help!

Yoon