Mykrobe-tools/mykrobe

mykrobe calls the wrong amino acid change for first drug resistance SNP on a bottom strand gene

tw1nk0i opened this issue · 2 comments

I ran Mykrobe on the FASTQ data for BioProject PRJEB5280 Run ERR751367 from NCBI. Mykrobe identifies a SNP conferring isoniazid resistance, katG_S315G-GCT2155167GGT. The resulting amino acid, in this case the G in S315G, is incorrect. The last part of the SNP, GCT2155167GGT, identifies the codon that was changed and the position within the genome. These nucleotides are always the top strand nucleotides. Therefore, for genes on the bottom strand, we must take the reverse complement. GCT becomes AGC, which is serine, the S in S315G. So far so good. However, the reverse complement of GGT is ACC, which is threonine, but the output S315G says the resulting amino acid is glycine. I believe the code is neglecting to use the reverse complement, so it is incorrectly translating GGT to glycine.

I have also tried running Mykrobe on .bam files that I generated by running snippy (version 3.2) on the same FASTQ data downloaded from NCBI. I see the same error, and looking manually at the bam file in IGV confirms that S315G is incorrect and should be S315T.

I suspect that this bug only affects the first drug resistant SNP listed in the json output that is in a bottom strand gene. katG_S315G-GCT2155167GGT just happens to be an isoniazid SNP, so it is near the top of the file and thus often the SNP affected.

Identical cases for katG_S315G-GCT2155167GGT occur with the following NCBI BioProject and Run pairs:
PRJEB5280, ERR779909
PRJEB10385, ERR1034759
PRJEB11653, ERR1213902

I have ~200 more identical cases. I can provide them if they would be helpful for debugging.

Thanks for spotting this. Yes, it is indeed a bug. Will investigate/fix some time this week. I expect the bug applies only to the "any amino acid change" entries in the mutation catalog - eg S315X. The "X" is not getting converted to the correct amino acid, as you've described.

This is now fixed on the master branch. I tested on ERR751367 and got katG_S315T-GCT2155167GGT in the resistance call for Isoniazid.