FelixKrueger/TrimGalore

support BGISEQ/DNBSEQ/MGISEQ?

Closed this issue · 9 comments

Hello, Do TrimGalore support auto filter adapter for BGISEQ/DNBSEQ/MGISEQ read yet? The adapter sequence is at OpenGene/fastp#259

Thanks~
Si

Would you be able to send me a small-ish test dataset (e.g. 100K reads) so I can take a look?

thanks~ I download SRR28167102 which is sequenced by DNBSEQ-G400 from NCBI SRA and the part of it is SRR28167102_part.DNBSEQ-G400.zip .

Hmm, I downloaded the entire dataset and added the following adapter sequences to the adapter file of FastQC:

AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA  MGI/BGI forward
AAGTCGGATCGTAGCCATGTCGTTCTGTGAGCCAAGGAGTTG   MGI/BGI reverse
AAGTCGGA    MGI/BGI universal
Screenshot 2024-10-16 at 13 43 17

It doesn't look like this is a good example of 'contamination' with the MGI/BGI adapter (the universal sequence is just about visible at the end but it also is only 8bp long...).

the result of fastqc is weird, beacuase I checked the r1 fq file by just zgrep the forward adapter got match as this:
image

Can you include a -c to count how many times it is found in total? Maybe it is just so low that it doesn't show up in a % plot?

Update:

I've just done this myself there are 162 instances of this adapter sequence, or 0.6% of total sequences, starting at different positions. This is indeed not something you would see very well accumulating in a FastQC plot...

zgrep -c AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA SRR28167102.DNBSEQ-G400.r1.fq.gz                                                                                               
162

can you extract only the 162 pair of read to fq files to do fastqc, if fastqc result show100% adapter will be normal ~

that works:

zgrep -A 2 -B 1 AAGTCGGAGGCCAAGCGGTCTTAGGAAGACAA SRR28167102.DNBSEQ-G400.r1.fq.gz | grep -v "^-" > hits.fastq
Screenshot 2024-10-18 at 09 56 16

This is the plot for the Read 2 adapter (80 sequences):

Screenshot 2024-10-18 at 09 58 56

I suppose adding a flag --bgiseq wouldn't be too difficult. If this type of sequencing becomes more common, we could also add it to the auto-detection.

--bgiseq is ok for now.

The option --bgiseq is now available from the dev branch. Can you let me know if it works as expected for you? If yes, it will be part of the next release.