`deeptools` `countReadsPerBin` output is not sorted
Closed this issue · 2 comments
ntanmayee commented
The new preprocessing pipeline uses deeptools
countReadsPerBin
class. This uses multiprocessing and is much faster than before.
However, the output from this is not sorted. This means that two runs of crpb.run()
can give different results making the rest of the DecoDen pipeline wrong.
ntanmayee commented
Potential solution --
Re-implement countReadsPerBin.py
and pass includeLabels=False
to mapReduce
. This should return the chromosome, start and end which will help in sorting the multiprocessing output.
ntanmayee commented
The previous solution does not work. This is the new strategy
- Read in
chrom_sizes.bed
file to get chromosome names and lengths - Call
count_reads_in_region
instead ofrun
. This is still run with multiprocessing, but the results are ordered - Concatenate resulting coverage arrays