StevenWingett/FastQ-Screen

Change single MT genome to multi-species mitochondria category

Closed this issue · 1 comments

We just encountered a case where RNA-seq libraries contained 50-70% mitochondrial reads, but FastQ Screen reported only 0.5-2%.

After some tests we found that the mitochondrial sequence used in FastQ Screen right now is the human sequence (MT dna:chromosome chromosome:GRCh38:MT:1:16569:1 REF), but the pig sequence (MT dna:chromosome chromosome:Sscrofa11.1:MT:1:16613:1 REF) is sufficienctly different to explain this discrepancy.

A suggested change would be to concatenate the MT sequences of several known organisms together, so an 'MT contamination' can be spotted more easily.

I wrote a script to generate a "combined genomes" mitochondrial sequence FASTA file. Each mitochondrial genome derived from a different origin was written into a new file as separate FASTA sequences. I've added the Bowtie2/Bismark index files to the Babraham cluster and the FTP site storing the --get_genomes genome files.

Please note that this change may lead to a greater proportion of reads multi-mapping to the mitochondrial genome in the FastQ Screen results, but this is because we are now mapping against multiple different versions of the MT genome.