support BGISEQ/DNBSEQ/MGISEQ?
Closed this issue · 9 comments
Hello, Do TrimGalore
support auto filter adapter for BGISEQ/DNBSEQ/MGISEQ
read yet? The adapter sequence is at OpenGene/fastp#259
Thanks~
Si
Would you be able to send me a small-ish test dataset (e.g. 100K reads) so I can take a look?
thanks~ I download SRR28167102
which is sequenced by DNBSEQ-G400
from NCBI SRA and the part of it is SRR28167102_part.DNBSEQ-G400.zip .
Hmm, I downloaded the entire dataset and added the following adapter sequences to the adapter file of FastQC:
AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA MGI/BGI forward
AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG MGI/BGI reverse
AAGTCGGA MGI/BGI universal
It doesn't look like this is a good example of 'contamination' with the MGI/BGI adapter (the universal sequence is just about visible at the end but it also is only 8bp long...).
Can you include a -c
to count how many times it is found in total? Maybe it is just so low that it doesn't show up in a % plot?
Update:
I've just done this myself there are 162 instances of this adapter sequence, or 0.6% of total sequences, starting at different positions. This is indeed not something you would see very well accumulating in a FastQC plot...
zgrep -c AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA SRR28167102.DNBSEQ-G400.r1.fq.gz
162
can you extract only the 162 pair of read to fq files to do fastqc
, if fastqc result show100% adapter will be normal ~
that works:
zgrep -A 2 -B 1 AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA SRR28167102.DNBSEQ-G400.r1.fq.gz | grep -v "^-" > hits.fastq
This is the plot for the Read 2 adapter (80 sequences):
I suppose adding a flag --bgiseq
wouldn't be too difficult. If this type of sequencing becomes more common, we could also add it to the auto-detection.
--bgiseq
is ok for now.
The option --bgiseq
is now available from the dev
branch. Can you let me know if it works as expected for you? If yes, it will be part of the next release.