marbl/seqrequester

Unable to find read distribution

Opened this issue · 2 comments

I am trying to use the "simulate" mode and keep getting the following error:

"ERROR: Don't know how long to make the reads. Set -distribution or -length"

However, I have set the -distribution parameter every way I know how, and I cannot seem to get the command to go through. This is my command:

seqrequester simulate -genome ./chr1_CPU_1.fasta -genomesize 260000000 -coverage 40 -distribution ./pacbio-hifi

I tried to see if I was missing any dependencies but there is no information about what they are.

Ouch, there was a lot of broken in there. It should be fixed now. You might need to do a git submodule update after git pull.

The -distribution option will look for the file in a bunch of places (the current directory, the location of the seqrequester binary, and the components of your PATH); it normally lives in build/share/seqrequester/. You can also give it an explicit path to a file of your own creation.

> cat distrib 
40 5
50 5
60 10
70 20

> seqrequester simulate -genome AE017225.1.fasta -nreads 500 -distribution ./distrib | seqrequester summarize -
Loading sequences from 'AE017225.1.fasta'
Loaded 1 sequences.

G=30680                            sum of  ||               length     num
NG         length     index       lengths  ||                range    seqs
----- ------------ --------- ------------  ||  ------------------- -------
00010           70        44         3080  ||         40                64|----------------
00020           70        88         6160  ||         41                 0|
00030           70       132         9240  ||         42                 0|
00040           70       176        12320  ||         43                 0|
00050           70       220        15400  ||         44                 0|
00060           60       265        18460  ||         45                 0|
00070           60       316        21520  ||         46                 0|
00080           60       367        24580  ||         47                 0|
00090           50       426        27620  ||         48                 0|
00100           40       500        30680  ||         49                 0|
001.000x                 500        30680  ||         50                60|---------------
                                           ||         51                 0|
                                           ||         52                 0|
                                           ||         53                 0|
                                           ||         54                 0|
                                           ||         55                 0|
                                           ||         56                 0|
                                           ||         57                 0|
                                           ||         58                 0|
                                           ||         59                 0|
                                           ||         60               120|------------------------------
                                           ||         61                 0|
                                           ||         62                 0|
                                           ||         63                 0|
                                           ||         64                 0|
                                           ||         65                 0|
                                           ||         66                 0|
                                           ||         67                 0|
                                           ||         68                 0|
                                           ||         69                 0|
                                           ||         70               256|---------------------------------------------------------------

--------------------- --------------------- ----------------------------------------------------------------------------------------------
       mononucleotide          dinucleotide                                                                                  trinucleotide
--------------------- --------------------- ----------------------------------------------------------------------------------------------
        9688 0.3158 A        3457 0.1145 AA        1263 0.0426 AAA          555 0.0187 AAC          621 0.0209 AAG          963 0.0324 AAT
        5563 0.1813 C        1578 0.0523 AC         538 0.0181 ACA          282 0.0095 ACC          317 0.0107 ACG          423 0.0143 ACT
        5396 0.1759 G        1570 0.0520 AG         547 0.0184 AGA          352 0.0119 AGC          260 0.0088 AGG          380 0.0128 AGT
       10033 0.3270 T        2928 0.0970 AT         812 0.0274 ATA          503 0.0169 ATC          563 0.0190 ATG          996 0.0336 ATT
                             1832 0.0607 CA         609 0.0205 CAA          294 0.0099 CAC          327 0.0110 CAG          569 0.0192 CAT
      --GC--  --AT--          964 0.0319 CC         344 0.0116 CCA          138 0.0046 CCC          167 0.0056 CCG          294 0.0099 CCT
      35.72%  64.28%          988 0.0327 CG         312 0.0105 CGA          189 0.0064 CGC          173 0.0058 CGG          304 0.0102 CGT
                             1694 0.0561 CT         398 0.0134 CTA          278 0.0094 CTC          307 0.0103 CTG          683 0.0230 CTT
                             1712 0.0567 GA         724 0.0244 GAA          196 0.0066 GAC          252 0.0085 GAG          505 0.0170 GAT
                             1115 0.0369 GC         365 0.0123 GCA          159 0.0054 GCC          192 0.0065 GCG          385 0.0130 GCT
                              942 0.0312 GG         336 0.0113 GGA          149 0.0050 GGC          156 0.0053 GGG          282 0.0095 GGT
                             1549 0.0513 GT         510 0.0172 GTA          193 0.0065 GTC          278 0.0094 GTG          540 0.0182 GTT
                             2526 0.0837 TA         815 0.0275 TAA          509 0.0171 TAC          345 0.0116 TAG          825 0.0278 TAT
                             1814 0.0601 TC         556 0.0187 TCA          368 0.0124 TCC          299 0.0101 TCG          559 0.0188 TCT
                             1804 0.0598 TG         492 0.0166 TGA          407 0.0137 TGC          332 0.0112 TGG          555 0.0187 TGT
                             3707 0.1228 TT         766 0.0258 TTA          813 0.0274 TTC          628 0.0212 TTG         1428 0.0481 TTT

Ouch, there was a lot of broken in there. It should be fixed now. You might need to do a git submodule update after git pull.

The -distribution option will look for the file in a bunch of places (the current directory, the location of the seqrequester binary, and the components of your PATH); it normally lives in build/share/seqrequester/. You can also give it an explicit path to a file of your own creation.

> cat distrib 
40 5
50 5
60 10
70 20

> seqrequester simulate -genome AE017225.1.fasta -nreads 500 -distribution ./distrib | seqrequester summarize -
Loading sequences from 'AE017225.1.fasta'
Loaded 1 sequences.

G=30680                            sum of  ||               length     num
NG         length     index       lengths  ||                range    seqs
----- ------------ --------- ------------  ||  ------------------- -------
00010           70        44         3080  ||         40                64|----------------
00020           70        88         6160  ||         41                 0|
00030           70       132         9240  ||         42                 0|
00040           70       176        12320  ||         43                 0|
00050           70       220        15400  ||         44                 0|
00060           60       265        18460  ||         45                 0|
00070           60       316        21520  ||         46                 0|
00080           60       367        24580  ||         47                 0|
00090           50       426        27620  ||         48                 0|
00100           40       500        30680  ||         49                 0|
001.000x                 500        30680  ||         50                60|---------------
                                           ||         51                 0|
                                           ||         52                 0|
                                           ||         53                 0|
                                           ||         54                 0|
                                           ||         55                 0|
                                           ||         56                 0|
                                           ||         57                 0|
                                           ||         58                 0|
                                           ||         59                 0|
                                           ||         60               120|------------------------------
                                           ||         61                 0|
                                           ||         62                 0|
                                           ||         63                 0|
                                           ||         64                 0|
                                           ||         65                 0|
                                           ||         66                 0|
                                           ||         67                 0|
                                           ||         68                 0|
                                           ||         69                 0|
                                           ||         70               256|---------------------------------------------------------------

--------------------- --------------------- ----------------------------------------------------------------------------------------------
       mononucleotide          dinucleotide                                                                                  trinucleotide
--------------------- --------------------- ----------------------------------------------------------------------------------------------
        9688 0.3158 A        3457 0.1145 AA        1263 0.0426 AAA          555 0.0187 AAC          621 0.0209 AAG          963 0.0324 AAT
        5563 0.1813 C        1578 0.0523 AC         538 0.0181 ACA          282 0.0095 ACC          317 0.0107 ACG          423 0.0143 ACT
        5396 0.1759 G        1570 0.0520 AG         547 0.0184 AGA          352 0.0119 AGC          260 0.0088 AGG          380 0.0128 AGT
       10033 0.3270 T        2928 0.0970 AT         812 0.0274 ATA          503 0.0169 ATC          563 0.0190 ATG          996 0.0336 ATT
                             1832 0.0607 CA         609 0.0205 CAA          294 0.0099 CAC          327 0.0110 CAG          569 0.0192 CAT
      --GC--  --AT--          964 0.0319 CC         344 0.0116 CCA          138 0.0046 CCC          167 0.0056 CCG          294 0.0099 CCT
      35.72%  64.28%          988 0.0327 CG         312 0.0105 CGA          189 0.0064 CGC          173 0.0058 CGG          304 0.0102 CGT
                             1694 0.0561 CT         398 0.0134 CTA          278 0.0094 CTC          307 0.0103 CTG          683 0.0230 CTT
                             1712 0.0567 GA         724 0.0244 GAA          196 0.0066 GAC          252 0.0085 GAG          505 0.0170 GAT
                             1115 0.0369 GC         365 0.0123 GCA          159 0.0054 GCC          192 0.0065 GCG          385 0.0130 GCT
                              942 0.0312 GG         336 0.0113 GGA          149 0.0050 GGC          156 0.0053 GGG          282 0.0095 GGT
                             1549 0.0513 GT         510 0.0172 GTA          193 0.0065 GTC          278 0.0094 GTG          540 0.0182 GTT
                             2526 0.0837 TA         815 0.0275 TAA          509 0.0171 TAC          345 0.0116 TAG          825 0.0278 TAT
                             1814 0.0601 TC         556 0.0187 TCA          368 0.0124 TCC          299 0.0101 TCG          559 0.0188 TCT
                             1804 0.0598 TG         492 0.0166 TGA          407 0.0137 TGC          332 0.0112 TGG          555 0.0187 TGT
                             3707 0.1228 TT         766 0.0258 TTA          813 0.0274 TTC          628 0.0212 TTG         1428 0.0481 TTT

Works perfectly, thank you!