YiPeng-Gao/scDaPars

PDUI of only less than 1000 genes were successfully estimated

Opened this issue · 1 comments

Hi,
First thanks for developing such an awesome tool. I applied it on my 10X data but found only ~1000 genes were included in the final output file. I think it's strange because there are much more genes. in the figures from your Genome Research paper,

The quality of my data is quite good (median: ~ 3000 UMIs and > 1500 genes per cell). All my cells are generated from 10X 3UTR library. I use cellranger to map reads to GRCh38, and then split the bam by sinto (python package). Then I used umi_tools for deduplication. The file size of bam for each cell ranges from 5MB to 20 MB. Then I used bedtools genomecov to get the bedgraph file and fed them to dapars2. When I run dapars2, I set the Coverage_threshold to zero in include as much as genes.

I wonder that, from your experience, is my result strange? If so, which step is possibly wrong? Could you please give me some kind advice?

Best
Yang

Here's some intermediate file, might be helpful for you.
This is a bedgraph for one cell
TTTCATGAGAAGTCAT-1.bedgraph.txt
TTTCATGAGAAGTCAT-1.readcount.txt
These are output file for one sample:
config.txt
dapars2_result.txt
readdepth.txt
scDaPars_imputed_results.txt
This is the reference UTR file I generated for GRch38:
hg38.extracted.3UTR.bed.txt