fermi hangs on a very small dataset
Opened this issue · 5 comments
I've run fermi on a very small dataset containing 22 fasta records using the following cmd:
run-fermi.pl -k 200 -p cdhitout_0.85 <reads.fa> | make -f -
however fermi
hangs indefinitely. When I look at top I can see that fermi ropebwt
is constantly in the sleep state:
45288 uqcskenn 20 0 24188 740 584 S 3 0.0 1:08.84 fermi ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp -
45447 uqcskenn 20 0 24188 740 584 S 2 0.0 1:08.00 fermi ropebwt -a bcr -v3 -btf cdhitout_0.90.ec.tmp -
I've tried using both the git HEAD and with release 1.1
<reads.fa>
contains:
>M00920:10:000000000-A292A:1:1101:2305:13136:1
CTTCTGGTGAAACCCACTCCCATGGTGTGACGGGCGGTGTGTACAAGACCCGGGAACGTATTCACCGCGACATGCTGATCCGCGATTACTAGCGATTCCGACTTCACGCAGTCGAGTTGCAGACTGCGATCCGGACTACGATCGGCTTTGTGAGATTCGCTCCGCCTCGCGGCTTGGCAACCCTCTGTACCGACCATTGTATGACGTGTGAAGCCCTACCCATAAGGGCCATGAGGACTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCTCGTTAAAGTGCCCAACCAAATGATGGCAATTAACGACAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACAT
>M00920:10:000000000-A292A:1:1101:24216:16298:1
CCCTTATCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGAGGAGGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATCAGAATGTCACGGTGAATACGTTCCCGGGCCTTGTACACACCGCCCATCACACCATGGGAGTGGGTTGCTCCAGAAGTAGCTAGTCTAACCGCAAGGGGGACGGTTACCA
>M00920:10:000000000-A292A:1:1110:4340:7240:1
CAGATTGAACGCTGGCGGCATGCTTTACACATGCAAGTCGAACGGCAGCGGGGGCTTCGGCCCGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTACCCATGTTGTGGGGGATAACGTAGCGAAAGCTACGCTAATACCGCATAAGCCCTGAGGGGGAAAGCGGGGGATTCTTCGGAACCTCGCGCAATTGGAGCGGCCGATGTCAGATTAGCTAGTTGGTAGGGTAAAGGCCTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCGGACTCCTCCGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGCAAGGGTGATC
>M00920:10:000000000-A292A:1:1110:21042:16009:1
ACCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCACATCTCTACGCATTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCACACTCGAGCCTTGCAGTCACAAACGCATTTCCCAGGTTAAGCCCGGGGATTTCACATCTGTCTTACAAAGCCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGGTGCTTGTTCTTCAGTTCCCGTCATTGACAGTCTATGTTAGACCCCGCCGTTTCGTTCCTGCCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGAATGGCTGGATCAGGGT
>M00920:10:000000000-A292A:1:1101:19922:4365:1
ATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGACAGACCAGGTCCAGGGGGCTGCCTTCGCCTTCGATGTTCCTCCTGATATCTACGTATTTCACTGCTACACCCGGATTTCCACCCCCCTCTACCGCACTCTAGGCACACAGTCACAAACGCATTTCCCAGGTTAAGCCCGGGGGTTTCAAATCTGAATTATTTAACCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTCGGTATGACCGCGACTGCCAGCGGGTAGGAAGGCGGTACTTTTTATTCCGGTGCCGACATCCTCCCCGGATATTCACCGCGGCTATTTCTTTCCGTCCGACAGAGGTGTAAAACCCGAAGGCGAGCTTG
>M00920:10:000000000-A292A:1:1101:18095:13295:1
GGAGGCAGCAGTGGGGAATTTTGGACAATGGGCGGAAGCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCGGTGGGGAAGAAATTGCACGGGTTAATACCCTGTGTAGATGACGGTACCCGACTAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCGCAGGCGGTTTGGTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTTGAGACTGCCAAGCTGGAGTGTGGCAGAGGGGGGTGGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATCAGGAG
>M00920:10:000000000-A292A:1:2102:3086:14182:1
GTAGTGACCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCACATCTCTACGCATTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCACACTCCAGCCTGGCAGTCTCAAATGCAGTTCCCAGGTTGAGCCCGGGGCTTTCACATCTGACTTACCAAACCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTAACGCGGCTGCTGGCACGTAGTTCGCCGGTGCTTCTTAGTCGGGTACCGTCATCTACACAGGATATTAGCCCGTGCAATTTCTTCCCCACCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGCATGGCTGGATCAGGCTTCCGCCC
>M00920:10:000000000-A292A:1:2108:13711:22806:1
GATTAAACGCTGGCGGCATGCCTTACACATGCAAGTCGAACGGCAGCACGGGGGCAACCCTGGTGGCGAGTGGTGGACGGGTGAGTAAAGCATCGGAACGTATCCTGAAGTGGAGTATAACGTAGCGAAAGTTACGCTAATACCGCATAGTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGCTCTGGGAGCGGCCGATGTCGGATTAGCTAGTTGGGGGGGTAAAGGCCTACCAAGGCGCGGCTCCGTAGCGGGGATTGGAGTATGAAACGCCACACTGTGACTGAGAAACGGCCCGGACTCCTACGTGAGGAAGCAGCGGTGAATTTTTTCCAATGGGTTCAAGCC
>M00920:10:000000000-A292A:1:2110:11377:9313:1
GCATCGGAACGTGCCCTGGAATGGGGGATAACGTAGCGAAAGTTACGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGATCGCAAGACCTTGCGTTCTGGGATCGGCCGATGTCGTATGAGCTAGTTGGTGGGGAAAAGGCCTACCACGGCGACGATCCGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGGACTGAGACACGGCCCAGACTCCTACGGGAGGCAGCCGTGGGGAATTTTGGACAATGGGCGCAAGCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCGGTGGGGAAGAAATTGCATGGGTTAATTCCC
>M00920:10:000000000-A292A:1:1105:17264:25408:1
GAATTACTGGGCGTAAAGCGTGCGCAGGCGGCGCCATAAGACAGACGTGAAATCCCCGGGCTTAACCTGGGAACTGCGTTTGTGACTGTGGTGCTCGAGTGTGGCAGAGGGGGGTGGAATTCCACGTGTAGCAGTGAAATGCGTAGAGATGTGGAGGAACACCGATGGCGAAGGCAGCCCCCTGGGTCAACACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGCGAACTAGGTGTTGGGGAAGGAGACGTTCTTAGTACCGCAGCTAACGCGTGAAGTTCGCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATGGACA
>M00920:10:000000000-A292A:1:2105:19316:26848:1
ATCCGTAGCTGGTCTGAGAGGACGACCAGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATTCCGCGTGAGTGAAGAAGGCCTTCGGGTTGTAAAGCTCTTTCAGCAGGAACGAAACGGCTCTCTCTAACATAGGGAGTTAATGACGGTACCTGAAGAAGAAGCACCGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGGTGCGAGCGTTAATCGGAATTACTGGGCGTAAAGCGTGCACAGGCGGCGCCATAAGACAGATGTGAAATCCCCGGGCTTAACCTGGGAAC
>M00920:10:000000000-A292A:1:1111:13173:15398:1
TGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTGCCAGAGATGGCTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGCACCGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTTCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGCTGAAGTCAAGTCATCATGGCCCTTATGGGTAGGGCGTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAAGCCGCGAGGTGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGC
>M00920:10:000000000-A292A:1:1102:8010:26367:1
GCCTTACACATGCAAGTCGAACGGCAGCGGAACTTCGGGTGCCGGCGAGTGGCGAACGGGTGAGTAATGCATCGGAACGTGCCATTGAGTGGGGGATAACGTAGCGAAAGTTGCGCTAATACCGCATATTCTGTGAGCAGGAAAGCAGGGGACCGCAAGGCCTTGCGCTCTTTGAGCGGCCGATGTCAGATTAGCTAGTTGGTGAGGTAAAGGCTTACCAAGGCGACGATCTGTAGCGGGTCTGAGAGGATGATCCGCCACACTGGAACTGAGACACGGTCCAGACTCCTACGGGAGGCAGCAGTGGGGAATTTTGGACAATGGGGGCAACCCTGATCCAGCCATGCCGCGTGAGTGAAGAAGGCCTTCGGGT
>M00920:10:000000000-A292A:1:1106:8344:21464:1
GTTCCTACCATTGTAGCACGTGTGTAGCCCTGGGCATAAAGGCCATGATGACTTGACATCATCCCCTCCTTCCTCGCGTCTTACGACGGCAGTTTCTTTAGAGTTCCCAGCTTAACCTGTTGGCAACTAAAGATAGGGGTTGCGCTCGTTGCGGGACTTAACCCAACACCTCACGGCACGAGCTGACGACAGCCATGCAGCACCTGTGTGACGGCTCCCTTTCGGGCACCCTCAACTCTCATCGAGGTTCCGTCCATGTCAAGGGTAGGTAAGGTTTTTCGCGTTGCATCGAATTAATCCACATCATCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTTAATC
>M00920:10:000000000-A292A:1:1109:11262:3539:1
TTTACCCACCCAACACCTAGTTGACATAGTTTAGGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTACCCACGCTTTCGTGCATGAGCGTCAGTATCGGCCCAGGGGGCTGCCTTCGCCATAGGTGTTCCTCCCCATCTCTACGCTTTTCACTGCTACACGTGGAATTCCACCCCCCTCTGCCGTACTCTAGTGAGGCAGTCACAAACGCAGTTCCCAGGTTACGCCCGGGGATTTCACGCCTGTCTTACCAATCCGCCTGCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTATTACCGCGGCTGCTGGCACGTAGTTAGCCGGTGCTTCTTATGCCGGTACCG
>M00920:10:000000000-A292A:1:1113:21063:11515:1
ACACAGGGTATTAACCCATGCGATTTCTTCCCGGCCGAAAGAGCTTTACAACCCGAAGGCCTTCTTCACTCACGCGGCATGGCTGGATCAGGGTTGCCCCCATTGTCCAAAATTCCCCACTGCTGCCTCCCGGAGGAGTCTGGCCCGTGTCTCAGTTCCAGTGTGGCGGATCATCCTCTCAGACCCGCTCCAGATCGTCGCCTTGGTAAGCCGTTACCTCACCAACTAGCTAATCTGACATAGGCCGCTCAAAGAGCGCAAGGCCTTGCGGTCCCCTGCTTTCCTGCTCACAGAATATGCGGTATTAGCGCAACTTTCGCTACGTTATCCCCCACTCAATGGCACGTTCCGATGCATTACTCACC
>M00920:10:000000000-A292A:1:2109:18065:11577:1
CCTTTGTATTGTCCATTGTAGCACGTGTGTAGCCCAAATCATAAGGGGCATGATGATTTGACGTCATCCCCACCTTCCTCCGGTTTGTCACCGGCAGTCAACTTAGAGTGCCCAACTTAATGATGGCAACTAAGCTTAAGGGTTGCGCTCGTTGCGGGACTTAACCCAACATCTCACGACACGAGCTGACGACAGCCATGCAGCACCAGTGTGACGGCTCCCTTTCGGGCACCCTCAACTCTCATCGAGGTTCCGTCCATGTCAAGGGTAGGTAAGGTTTTTCGCGTTGCATCGAATTAATCCACATCATCCACCGCTTGTGCGGGTCCCCGTCAATTCCTTTGAGTTTTAATC
>M00920:10:000000000-A292A:1:2113:10809:18271:1
GTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTCACCTACCCTTGACATGGACGGAACCTCGATGAGAGTTGAGGGTGCCCGAAAGGGAGCCGTCACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTAAGCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGTTGACTGCCGGTGACAAACCGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGATTTGGGCTACACACGTGCTACAA
>M00920:10:000000000-A292A:1:2101:18998:6292:1
GTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAACTTAGCAGAGATGCTTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAAGGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTATCCTTTGTTGCCAGCGGTCCGGCCGGGAACTCAAAGGAGACTGCCAGTGATAAACTGGAGGAAGGTGGGGATGACGTCAAGTCCTCATGGCCCTTATGGGTAGGGCTTCACACGTCATACAATGGTCGGAACAGAGGGTTGCCAAGCCGCGAGGTGGAGCCAATCCCAGAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGAC
>M00920:10:000000000-A292A:1:2108:17778:22051:1
ATCCACAGAACTTAGCAGAGATGCTTTGGTGCCTTCGGGAACTGTGAGACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTTGTGAAATGTTGGGTTAAGTCCCGTAACGAGCGCAACCCTTGTCCTTAGTTACCAGCACCTCGGGTGGGCACTCTAAGGAGACTGCCGGTGACAAACCGGGGGAAGGTGGGGATGACGTCAAGTCATCATGGCCCTTACGGCCAGGGCTACACACGTGCTACAATGGTCGGTACAAAGGGTTGCCAAGCCGCGAGGTGGAGCTAATCCCATAAAACCGATCGTAGTCCGGATCGCAGTCTGCAACTCGACTGCGTGAAGTCGGAATCGCTAGTAATCGTGAATC
>M00920:10:000000000-A292A:1:1104:5131:15907:1
GTACTGACGCTCATGCACGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCCTAAACGATGTCGACTAGTCGTTCGGAGCAGCAATGCACTGAGTGACGCAGCTAACGCGTGAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAAGGAATTGACGGGGACCCGCACAAGCGGTGGATGATGTGGATTAATTCGATGCAACGCGAAAAACCTTACCTACCCTTGACATGTCTGGAGCCTTGGTGAGAGCCGAGGGTGCCTTCGGGAGCCAGAACACAGGTGCTGCATGGCTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGT
>M00920:10:000000000-A292A:1:1113:7839:16644:1
CGTTTAGGGCGTGGACTACCAGGGTATCTAATCCTGTTTGCTCCCCACGCTTTCGTGCATGAGCGTCAGTACAGGCCCAGGGGGCTGCCTTCGCCATCGGTGTTCCTCCTGATCTCTACGCATTTCACTGCTACACCAGGAATTCCACACACTTCTGCCGTACTCTAGCCTTGCAGTCACAAACGCAGTTCCCAGGTTAAGCCCGGGGATTTCACATCTGTCTTACAAAAACGCCTCCGCACGCTTTACGCCCAGTAATTCCGATTAACGCTCGCACCCTACGTTTTACCGCGGCTGCTGGCACGTTTTTAGCCGGTGCTTCTTAGTCCGGTACCGTCATCCATGGCCTATGTTAGAGAC
With your command line, fermi should not use ropebwt. Can you find string ropebwt
in your makefile?
Yes I can, full makefile shown below
FERMI=fermi
UNITIG_K=200
OVERLAP_K=240
all:cdhitout_0.85.p2.mag.gz
# Construct the FM-index for raw sequences
cdhitout_0.85.raw.fmd:../cdhitout_0.85.fa
(cat ../cdhitout_0.85.fa) | $(FERMI) ropebwt -a bcr -v3 -btNf cdhitout_0.85.raw.tmp - > $@ 2> $@.log
# Error correction
cdhitout_0.85.ec.fq.gz:cdhitout_0.85.raw.fmd
(cat ../cdhitout_0.85.fa) | $(FERMI) correct -t 2 $< - 2> $@.log | gzip -1 > $@
# Construct the FM-index for corrected sequences
cdhitout_0.85.ec.fmd:cdhitout_0.85.ec.fq.gz
$(FERMI) fltuniq $< 2> cdhitout_0.85.fltuniq.log | $(FERMI) ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp - > $@ 2> $@.log
# Generate unitigs
cdhitout_0.85.p0.mag.gz:cdhitout_0.85.ec.fmd
$(FERMI) unitig -t 2 -l $(UNITIG_K) $< 2> $@.log | gzip -1 > $@
cdhitout_0.85.p1.mag.gz:cdhitout_0.85.p0.mag.gz
$(FERMI) clean $< 2> $@.log | gzip -1 > $@
cdhitout_0.85.p2.mag.gz:cdhitout_0.85.p1.mag.gz
$(FERMI) clean -CAOFo $(OVERLAP_K) $< 2> $@.log | gzip -1 > $@
I see. I was using an old version of run-fermi.pl. More recent version use ropebwt by default. Anyway, I can see the problem now: fltuniq
has filtered out all the reads, while ropebwt
is expecting some input and thus hanging for some reason. For the time being, you can edit makefile and change the line containing fltuniq
to cat $< | $(FERMI) ropebwt -a bcr -v3 -btf cdhitout_0.85.ec.tmp - > $@ 2> $@.log
. This skips fltuniq
. I will look into the ropebwt issue later. But anyway, probably you won't get a good assembly from these reads.
For small files, actually we'd better not use fltuniq
anyway. I should consider to add an option to optionally skip fltuniq
altogether.
thanks, specifying -B
in run-fermi.pl
prevents the hang as well