shenwei356/seqkit

seqkit locate: typo in error message

stas-malavin opened this issue · 6 comments

$ cat contig.fa | seqkit locate -p '"aaggttaa.{0,24}cagcacct"' -r -P -m 2
[ERRO] flag -r (--use-regexp) not allowed when giving flag -m (--use-regexp)

Should be -m (--max-mismatch)

Thank you Stas!

By the way, can I somehow easily delete the located sequences using seqkit locate?
I'm trimming adapters and barcodes out of a messy Nanopore assembly. There are various combinations of sequences that I need to locate and trim.
What I'm doing now is combining beds from several locate runs and then editing the resulting bed externally, to combine regions from the same contigs and add coordinates for contigs that have no adapters [start:end], and finally using seqkit subseq --bed to get the trimmed assembly.
Would be extremely nice to do it all in one go…

Yes, you can just use the amplicon. Example 5.

$ echo -ne ">s\nacggaaaaa\n" 
>s
acggaaaaa

$ echo -ne ">s\nacggaaaaa\n" \
    | seqkit amplicon -F actg -m 1 -f -r 1:99999999999
[INFO] 1 primer pair loaded
>s
aaaaa

There are various combinations of sequences that I need to locate and trim.

seqkit amplicon support a list of primers with -p, --primer-file.

Gosh, I knew it's somewhere there…
Thanks so much!

Actually, I need a regular expression AAGGTTAA.{0,30}CAGCACCT, which seems not possible with amplicon.
But, I can put all the actual adapters instead of .{0,30}, put them all in a file, as you suggested, and allow some mismatches. Yeah, should work this way.

Oh, yes. amplicon does not support regular expressions.