Unable to find read distribution
Opened this issue · 2 comments
I am trying to use the "simulate" mode and keep getting the following error:
"ERROR: Don't know how long to make the reads. Set -distribution or -length"
However, I have set the -distribution parameter every way I know how, and I cannot seem to get the command to go through. This is my command:
seqrequester simulate -genome ./chr1_CPU_1.fasta -genomesize 260000000 -coverage 40 -distribution ./pacbio-hifi
I tried to see if I was missing any dependencies but there is no information about what they are.
Ouch, there was a lot of broken in there. It should be fixed now. You might need to do a git submodule update
after git pull
.
The -distribution option will look for the file in a bunch of places (the current directory, the location of the seqrequester binary, and the components of your PATH); it normally lives in build/share/seqrequester/
. You can also give it an explicit path to a file of your own creation.
> cat distrib
40 5
50 5
60 10
70 20
> seqrequester simulate -genome AE017225.1.fasta -nreads 500 -distribution ./distrib | seqrequester summarize -
Loading sequences from 'AE017225.1.fasta'
Loaded 1 sequences.
G=30680 sum of || length num
NG length index lengths || range seqs
----- ------------ --------- ------------ || ------------------- -------
00010 70 44 3080 || 40 64|----------------
00020 70 88 6160 || 41 0|
00030 70 132 9240 || 42 0|
00040 70 176 12320 || 43 0|
00050 70 220 15400 || 44 0|
00060 60 265 18460 || 45 0|
00070 60 316 21520 || 46 0|
00080 60 367 24580 || 47 0|
00090 50 426 27620 || 48 0|
00100 40 500 30680 || 49 0|
001.000x 500 30680 || 50 60|---------------
|| 51 0|
|| 52 0|
|| 53 0|
|| 54 0|
|| 55 0|
|| 56 0|
|| 57 0|
|| 58 0|
|| 59 0|
|| 60 120|------------------------------
|| 61 0|
|| 62 0|
|| 63 0|
|| 64 0|
|| 65 0|
|| 66 0|
|| 67 0|
|| 68 0|
|| 69 0|
|| 70 256|---------------------------------------------------------------
--------------------- --------------------- ----------------------------------------------------------------------------------------------
mononucleotide dinucleotide trinucleotide
--------------------- --------------------- ----------------------------------------------------------------------------------------------
9688 0.3158 A 3457 0.1145 AA 1263 0.0426 AAA 555 0.0187 AAC 621 0.0209 AAG 963 0.0324 AAT
5563 0.1813 C 1578 0.0523 AC 538 0.0181 ACA 282 0.0095 ACC 317 0.0107 ACG 423 0.0143 ACT
5396 0.1759 G 1570 0.0520 AG 547 0.0184 AGA 352 0.0119 AGC 260 0.0088 AGG 380 0.0128 AGT
10033 0.3270 T 2928 0.0970 AT 812 0.0274 ATA 503 0.0169 ATC 563 0.0190 ATG 996 0.0336 ATT
1832 0.0607 CA 609 0.0205 CAA 294 0.0099 CAC 327 0.0110 CAG 569 0.0192 CAT
--GC-- --AT-- 964 0.0319 CC 344 0.0116 CCA 138 0.0046 CCC 167 0.0056 CCG 294 0.0099 CCT
35.72% 64.28% 988 0.0327 CG 312 0.0105 CGA 189 0.0064 CGC 173 0.0058 CGG 304 0.0102 CGT
1694 0.0561 CT 398 0.0134 CTA 278 0.0094 CTC 307 0.0103 CTG 683 0.0230 CTT
1712 0.0567 GA 724 0.0244 GAA 196 0.0066 GAC 252 0.0085 GAG 505 0.0170 GAT
1115 0.0369 GC 365 0.0123 GCA 159 0.0054 GCC 192 0.0065 GCG 385 0.0130 GCT
942 0.0312 GG 336 0.0113 GGA 149 0.0050 GGC 156 0.0053 GGG 282 0.0095 GGT
1549 0.0513 GT 510 0.0172 GTA 193 0.0065 GTC 278 0.0094 GTG 540 0.0182 GTT
2526 0.0837 TA 815 0.0275 TAA 509 0.0171 TAC 345 0.0116 TAG 825 0.0278 TAT
1814 0.0601 TC 556 0.0187 TCA 368 0.0124 TCC 299 0.0101 TCG 559 0.0188 TCT
1804 0.0598 TG 492 0.0166 TGA 407 0.0137 TGC 332 0.0112 TGG 555 0.0187 TGT
3707 0.1228 TT 766 0.0258 TTA 813 0.0274 TTC 628 0.0212 TTG 1428 0.0481 TTT
Ouch, there was a lot of broken in there. It should be fixed now. You might need to do a
git submodule update
aftergit pull
.The -distribution option will look for the file in a bunch of places (the current directory, the location of the seqrequester binary, and the components of your PATH); it normally lives in
build/share/seqrequester/
. You can also give it an explicit path to a file of your own creation.> cat distrib 40 5 50 5 60 10 70 20 > seqrequester simulate -genome AE017225.1.fasta -nreads 500 -distribution ./distrib | seqrequester summarize - Loading sequences from 'AE017225.1.fasta' Loaded 1 sequences. G=30680 sum of || length num NG length index lengths || range seqs ----- ------------ --------- ------------ || ------------------- ------- 00010 70 44 3080 || 40 64|---------------- 00020 70 88 6160 || 41 0| 00030 70 132 9240 || 42 0| 00040 70 176 12320 || 43 0| 00050 70 220 15400 || 44 0| 00060 60 265 18460 || 45 0| 00070 60 316 21520 || 46 0| 00080 60 367 24580 || 47 0| 00090 50 426 27620 || 48 0| 00100 40 500 30680 || 49 0| 001.000x 500 30680 || 50 60|--------------- || 51 0| || 52 0| || 53 0| || 54 0| || 55 0| || 56 0| || 57 0| || 58 0| || 59 0| || 60 120|------------------------------ || 61 0| || 62 0| || 63 0| || 64 0| || 65 0| || 66 0| || 67 0| || 68 0| || 69 0| || 70 256|--------------------------------------------------------------- --------------------- --------------------- ---------------------------------------------------------------------------------------------- mononucleotide dinucleotide trinucleotide --------------------- --------------------- ---------------------------------------------------------------------------------------------- 9688 0.3158 A 3457 0.1145 AA 1263 0.0426 AAA 555 0.0187 AAC 621 0.0209 AAG 963 0.0324 AAT 5563 0.1813 C 1578 0.0523 AC 538 0.0181 ACA 282 0.0095 ACC 317 0.0107 ACG 423 0.0143 ACT 5396 0.1759 G 1570 0.0520 AG 547 0.0184 AGA 352 0.0119 AGC 260 0.0088 AGG 380 0.0128 AGT 10033 0.3270 T 2928 0.0970 AT 812 0.0274 ATA 503 0.0169 ATC 563 0.0190 ATG 996 0.0336 ATT 1832 0.0607 CA 609 0.0205 CAA 294 0.0099 CAC 327 0.0110 CAG 569 0.0192 CAT --GC-- --AT-- 964 0.0319 CC 344 0.0116 CCA 138 0.0046 CCC 167 0.0056 CCG 294 0.0099 CCT 35.72% 64.28% 988 0.0327 CG 312 0.0105 CGA 189 0.0064 CGC 173 0.0058 CGG 304 0.0102 CGT 1694 0.0561 CT 398 0.0134 CTA 278 0.0094 CTC 307 0.0103 CTG 683 0.0230 CTT 1712 0.0567 GA 724 0.0244 GAA 196 0.0066 GAC 252 0.0085 GAG 505 0.0170 GAT 1115 0.0369 GC 365 0.0123 GCA 159 0.0054 GCC 192 0.0065 GCG 385 0.0130 GCT 942 0.0312 GG 336 0.0113 GGA 149 0.0050 GGC 156 0.0053 GGG 282 0.0095 GGT 1549 0.0513 GT 510 0.0172 GTA 193 0.0065 GTC 278 0.0094 GTG 540 0.0182 GTT 2526 0.0837 TA 815 0.0275 TAA 509 0.0171 TAC 345 0.0116 TAG 825 0.0278 TAT 1814 0.0601 TC 556 0.0187 TCA 368 0.0124 TCC 299 0.0101 TCG 559 0.0188 TCT 1804 0.0598 TG 492 0.0166 TGA 407 0.0137 TGC 332 0.0112 TGG 555 0.0187 TGT 3707 0.1228 TT 766 0.0258 TTA 813 0.0274 TTC 628 0.0212 TTG 1428 0.0481 TTT
Works perfectly, thank you!