SOLVED: Taxa not appearing in radar/composition plots, but listed in the ReadFateTable
Opened this issue · 3 comments
Hello community,
I recently performed a metagenomic analysis on sequencing data generated from a zymomock community by Nanopore Sequencing (MinION) and wanted to assess the sequencing bias by using this tool.
Unfortunately, the generated HTML report showed radar/composition plots without two of the ten analyzed taxa. In particular, the taxa Lactobacillus fermentum and Cryptococcus neoformans were not plotted, but instead listed in the ReadFateTable at the bottom of the report. L. fermentum makes up about 50 % of the total amount of reads, obviously indicating that the run was biased. Could that be the reason for the plotting behaviour?
The miqScoreShotgunPublic
tool was executed using the following command:
docker container run \
-v /path/to/miq_score_analysis:/data \
-e SAMPLENAME=zymomock_run_20210818_guppy_5_0_16_sup \
-e MODE=LONG \
miqscoreshotgun
After the analysis completed successfully, the following report was generated:
Since I was able to solve this issue by myself, I want to present my workaround to avoid the reported behaviour regarding the HTML report generation.
Unfortunately, when using it in LONG
mode, the tool automatically assumes that you have sequenced the ZymoBIOMICS HMW DNA Standard (Cat. # D6322), which was especially designed for long read sequencing. However, we sequenced a ZymoBIOMICS Microbial Community Standard (Cat. # D6300). The difference between these two standards is the overall composition: The HMW DNA Standard does not contain the two species Lactobacillus fermentum and Cryptococcus neoformans, whereas the Microbial Community Standard does contain these. Therefore, the tool only picks the rermaining 8 species, when running in LONG
mode, though it also detects the reads of the two species that would not be in a HMW standard. Furthermore, it calculates the MIQ Score by only considering the compositional properties of a HMW DNA Standard (gDNA abundance of 14 % for each bacterial species and 2 % for each yeast instead of 12 % for each bacterial species and 2 % for each yeast).
This is truly a limiting issue since you would not be able to use the LONG
mode for the Microbial Community Standard, even though you had sequenced it using long read sequencing. Not using the LONG
mode might be an option, if read mapping using BWA instead of minimap2 would be acceptable. In our case, we decided to modify the source code of the tool in order to enable the LONG
mode for the Microbial Community Standard.
This can be achieved by changing the function getApplicationParametersLong()
declared in the file analyzeStandardReads.py
(located directly in the miqScoreShotgunPublic/
repository) from:
def getApplicationParametersLong():
parameters = miqScoreShotgunPublicSupport.parameters.environmentParameterParser.EnvParameters()
parameters.addParameter("sampleName", str, required=True, externalValidation=True)
parameters.addParameter("maxReadCount", int, default=default.maxReadCount, lowerBound=0)
parameters.addParameter("workingFolder", str, default=default.workingFolder, expectedDirectory=True)
parameters.addParameter("reads", str, default = default.forwardReads, expectedFile=True)
parameters.addParameter("sequenceFolder", str, default.sequenceFolder, expectedDirectory=True)
parameters.addParameter("outputFolder", str, default=default.outputFolder, createdDirectory=True)
parameters.addParameter("referenceGenome", str, default=default.referenceGenome, expectedFile=True)
parameters.addParameter("fileNamingStandard", str, default="zymo", externalValidation=True)
parameters.addParameter("referenceDataFile", str, default=default.referenceDataFileHMW, expectedFile=True)
parameters.addParameter("bacteriaOnly", bool, required=False, default=False)
if parameters.bacteriaOnly.value:
print("ANALYZING BACTERIA ONLY")
defaultBadExample = default.badMiqExampleBacteriaOnlyHMW
defaultGoodExample = default.goodMiqExampleBacteriaOnlyHMW
else:
defaultBadExample = default.badMiqExampleHMW
defaultGoodExample = default.goodMiqExampleHMW
parameters.addParameter("goodMiqExample", str, default=defaultGoodExample, expectedFile=True)
parameters.addParameter("badMiqExample", str, default=defaultBadExample, expectedFile=True)
if not validSampleName(parameters.sampleName.value):
logger.error("Invalid sample name given: %s" %parameters.sampleName.value)
raise ValueError("Invalid sample name given: %s" %parameters.sampleName.value)
parameters.checkCreatedFileStructures()
return parameters
to
def getApplicationParametersLong():
parameters = miqScoreShotgunPublicSupport.parameters.environmentParameterParser.EnvParameters()
parameters.addParameter("sampleName", str, required=True, externalValidation=True)
parameters.addParameter("maxReadCount", int, default=default.maxReadCount, lowerBound=0)
parameters.addParameter("workingFolder", str, default=default.workingFolder, expectedDirectory=True)
parameters.addParameter("reads", str, default = default.forwardReads, expectedFile=True)
parameters.addParameter("sequenceFolder", str, default.sequenceFolder, expectedDirectory=True)
parameters.addParameter("outputFolder", str, default=default.outputFolder, createdDirectory=True)
parameters.addParameter("referenceGenome", str, default=default.referenceGenome, expectedFile=True)
parameters.addParameter("fileNamingStandard", str, default="zymo", externalValidation=True)
parameters.addParameter("referenceDataFile", str, default=default.referenceDataFile, expectedFile=True) # change 1
parameters.addParameter("bacteriaOnly", bool, required=False, default=False)
if parameters.bacteriaOnly.value:
print("ANALYZING BACTERIA ONLY")
defaultBadExample = default.badMiqExampleBacteriaOnly # change 2
defaultGoodExample = default.goodMiqExampleBacteriaOnly # change 3
else:
defaultBadExample = default.badMiqExample # change 4
defaultGoodExample = default.goodMiqExample # change 5
parameters.addParameter("goodMiqExample", str, default=defaultGoodExample, expectedFile=True)
parameters.addParameter("badMiqExample", str, default=defaultBadExample, expectedFile=True)
if not validSampleName(parameters.sampleName.value):
logger.error("Invalid sample name given: %s" %parameters.sampleName.value)
raise ValueError("Invalid sample name given: %s" %parameters.sampleName.value)
parameters.checkCreatedFileStructures()
return parameters
Note, that I have highlighted the modified code lines by adding # change 1-5
at the end of each line. In fact, you only need to remove the "HMW" suffix in the related commands.
After applying the above changes, the docker container (my prefered method of executing the tool) needs to be re-build:
cd miqScoreShotgunPublic/
docker build -t miqscoreshotgun .
Then, the analysis can be repeated and will result in a complete HTML report.
Even though the above solution worked in my case, I would like to keep this issue open and request a feature:
The implementation of a new command line option in order to choose the type of standard (especially when using the LONG
mode) would solve this issue completely.
Thanks @JulianMohr for the fix! I agree it would be great if there was a command line option to select the type of standard.