jodyphelan/NTM-Profiler

Questions about the results text output.

Closed this issue · 3 comments

Hi Jody,

   Output from NTM Profiler version 0.2.1 is below: 

A  few questions:

  1)   I noticed that the gene name is not listed in the "Resistance variants report"  and "Other variants report" sections. Could the gene name be added to these sections? 

  2) What is meant to be reported in the "Resistance genes report" section? 

  3) Could you explain what the 149.500 means in the "Mean Kmer Coverage" section?

  4) Pipeline version is showing as 0.2.0 .  Should this be 0.2.1 ? 

 Thanks very much for making this fantastic tool available to the community!

    Michael 

NTM-Profiler report

The following report has been generated by NTM-Profiler.

Summary

ID: SRR315352
Date: Wed Aug 17 16:10:18 2022

Species report

Species Mean Kmer Coverage
Mycobacterium abscessus subsp. massiliense 149.500

Resistance report

Drug Genotypic Resistance Mutations
Macrolides R rrl n.2270A>G (1.00)
Amikacin R rrs n.1375A>G (1.00)

Resistance genes report

Locus Tag Gene Drug

Resistance variants report

Genome Position Locus Tag Variant Type Change Estimated Fraction Drug
1463772 MAB_r5051 non_coding_transcript_exon_variant n.1375A>G 1.000 amikacin
1466477 MAB_r5052 non_coding_transcript_exon_variant n.2270A>G 1.000 macrolides

Other variants report

Genome Position Locus Tag Variant Type Change Estimated Fraction
1462247 MAB_r5051 upstream_gene_variant n.-151A>G 1.000
1462267 MAB_r5051 upstream_gene_variant n.-131_-130insA 1.000
1462275 MAB_r5051 upstream_gene_variant n.-123T>C 1.000
1463374 MAB_r5051 non_coding_transcript_exon_variant n.977C>T 1.000
1463968 MAB_r5052 upstream_gene_variant n.-240A>G 1.000
1464005 MAB_r5052 upstream_gene_variant n.-203_-202insC 1.000
1464183 MAB_r5052 upstream_gene_variant n.-25G>A 1.000
1464841 MAB_r5052 non_coding_transcript_exon_variant n.634C>T 1.000
1465924 MAB_r5052 non_coding_transcript_exon_variant n.1717A>G 0.964
1467208 MAB_r5052 non_coding_transcript_exon_variant n.3001T>C 1.000
2345756 MAB_2297 upstream_gene_variant c.-199T>C 1.000
2345783 MAB_2297 upstream_gene_variant c.-172C>T 0.992
2345889 MAB_2297 upstream_gene_variant c.-66C>T 1.000
2345891 MAB_2297 upstream_gene_variant c.-64A>G 1.000
2345896 MAB_2297 upstream_gene_variant c.-59A>G 1.000
2345927 MAB_2297 upstream_gene_variant c.-28A>G 1.000
2345951 MAB_2297 upstream_gene_variant c.-4C>T 0.989
2345995 MAB_2297 missense_variant p.Pro14Gln 1.000
2346000 MAB_2297 missense_variant p.Thr16Ala 1.000
2346014 MAB_2297 frameshift_variant c.64_65delCG 0.855
2346039 MAB_2297 missense_variant p.Val29Phe 1.000
2346044 MAB_2297 synonymous_variant c.90C>T 1.000
2346063 MAB_2297 missense_variant p.Asp37Asn 1.000
2346077 MAB_2297 synonymous_variant c.123A>G 1.000
2346392 MAB_2297 synonymous_variant c.438A>C 0.970
2346420 MAB_2297 missense_variant p.Ala156Thr 0.990

Coverage report

Gene Locus_Tag Cutoff Fraction
rrs MAB_r5051 0 0.000
rrl MAB_r5052 0 0.000
erm(41) MAB_2297 0 0.287

Missing positions report

N/A

Analysis pipeline specifications

Pipeline version: 0.2.0
Species Database version: N/A
Resistance Database version: Mycobacterium_abscessus_subsp._massiliense_93c979b_Jody Phelan jody.phelan@lshtm.ac.uk_Fri Apr 29 17:33:42 2022 +0100

Analysis Program
Kmer counting kmc
Mapping bwa
Variant calling freebayes_

Hi @harrismia

Thanks for using the tool!

  1. I noticed that the gene name is not listed in the "Resistance variants report" and "Other variants report" sections. Could the gene name be added to these sections?

Yes that is definitely possible. I'll add that into the next release (which should be within the next week or two).

  1. What is meant to be reported in the "Resistance genes report" section?

There are some cases of genes causing resistance to a particular drug. For example - the erm(41) gene in MAB subspecies confers inducible resistance to macrolides [1,2]. In this case, the presence of an in-tact gene leads to resistance. As many of the columns of the drug-resistance variants section such as frequency, change and type don't really fit with a gene, we decided to make a new section specifically for resistance genes.

Resistance genes report
-----------------
Locus Tag       Gene    Drug
MAB_2297        erm(41) macrolides
  1. Could you explain what the 149.500 means in the "Mean Kmer Coverage" section?

The way species are predicted is by the detection of specific kmers that have been found to be exclusive to certain species/subspecies. Just in case there are mutations or deletions in some isolates for the specific kmers we have selected 20 kmers per species. The "mean kmer coverage" is the mean count found across the 20 kmers. In general - it depends on the sequencing depth but in general I guess higher values will give more confidence to the prediction. It can also potentially give a rough idea on the proportions of mixed infections - although I haven't validated it for this use!

Species report
-----------------
Species Mean Kmer Coverage
Mycobacterium abscessus subsp. massiliense      36.050
Mycobacterium abscessus subsp. abscessus        21.900
  1. Pipeline version is showing as 0.2.0 . Should this be 0.2.1 ?

You're right, this should be showing v0.2.1. I'll update for the next release!

Thanks again for the feedback, and let me know if there are any more questions!

Thanks very much for the information and for developing this valuable tool!

No problem, glad it's useful!