multi-threading indexing?
Closed this issue · 4 comments
I'm running methylpy single-end-pipeline currently and it has been working very well!
Except that indexing is taking a very long time, and it only uses one core even though I set --num-procs 32. Is there a way I can do multithreading for indexing?
Command line used
methylpy single-end-pipeline --read-files
Output file with time stamps
Begin splitting reads for 21_R2.non-pbat_libA
Fri Sep 25 08:49:01 2020
No trimming on reads
Fri Sep 25 08:54:00 2020
Begin converting reads for 21_R2.non-pbat_libA
Fri Sep 25 08:54:00 2020
Begin Running Bowtie2 for libA
Fri Sep 25 08:54:18 2020
32115469 reads; of these:
32115469 (100.00%) were unpaired; of these:
14266954 (44.42%) aligned 0 times
11212951 (34.91%) aligned exactly 1 time
6635564 (20.66%) aligned >1 times
55.58% overall alignment rate
Processing forward strand hits
Fri Sep 25 09:31:11 2020
32115469 reads; of these:
32115469 (100.00%) were unpaired; of these:
14350834 (44.69%) aligned 0 times
11210926 (34.91%) aligned exactly 1 time
6553709 (20.41%) aligned >1 times
55.31% overall alignment rate
Processing reverse strand hits
Fri Sep 25 10:12:29 2020
Finding multimappers
Fri Sep 25 10:14:23 2020
[bam_sort_core] merging from 0 files and 32 in-memory blocks...
There are 32115469 total input reads
Fri Sep 25 10:23:09 2020
There are 20282539 uniquely mapping reads, 63.1550453148 percent remaining
Fri Sep 25 10:23:09 2020
Begin calling mCs
Fri Sep 25 10:23:09 2020
Input not indexed. Indexing...
Fri Sep 25 10:23:09 2020
[mpileup] 1 samples in 1 input files
Done
Fri Sep 25 11:50:51 2020
Hey, yes the latest version of samtools allows multi-threading indexing. I just added this new feature to methylpy. I would expect that indexing becomes much faster in the latest version of methylpy.
Thanks for the reply! What would be the latest version # of methylpy? Mine is 1.4.3 (installed from Anaconda).
output from methylpy
$ methylpy
usage: methylpy [-h] ...
You are using methylpy 1.4.3 version
(/python3.7/site-packages/methylpy/)
optional arguments:
-h, --help show this help message and exit
functions:
build-reference Building reference for bisulfite sequencing data
single-end-pipeline
Methylation pipeline for single-end data
paired-end-pipeline
Methylation pipeline for paired-end data
DMRfind Identify differentially methylated regions
reidentify-DMR Re-call DMRs from existing DMRfind result
add-methylation-level
Get methylation level of genomic regions
bam-quality-filter Filter out single-end reads by mapping quality and mCH
level
call-methylation-state
Call cytosine methylation state from BAM file
allc-to-bigwig Get bigwig file from allc file
merge-allc Merge allc files
index-allc Index allc files
filter-allc Filter allc file
test-allc Binomial test on allc file
The latest is 1.4.6. Methylpy can be updated through conda and pip.
You can upgrade the package using pip. Conda is usually late (it still uses the version released 4months ago).
pip install --upgrade methylpy