PM tutorial: Taxonomic classification results are the same before and after retraining the classifier

Question

PM tutorial: Taxonomic classification results are the same before and after retraining the classifier

misialq opened this issue 4 years ago · 2 comments

Bug Description
In the course of going through the Parkinson’s Mouse Tutorial, we noticed that when looking at the taxonomic classification results using the classifier retrained with the information about the typical stool sample composition, the species mentioned in the tutorial (B. ovatus) can be found in both sets of results in equal counts. In other words, with the data provided in the tutorial, retraining the classifier does not really improve the classification results with regard to B. ovatus. It seems as if the data originally used to train the first classifier changed in the meantime giving rise to similar results. In this context the tutorial question about the presence of B. ovatus in both results is potentially outdated.

Steps to reproduce the behavior

Open the taxonomy.qzv and bespoke_taxonomy.qzv visualizations from the PM tutorial
Filter the taxon list for "ovatus"
Compare results obtained in both

Expected behavior
Not sure, but supposedly the original taxonomy result should have less taxons identified as ovatus?

Actual behavior
Both results show the same number of ovatus taxa.

Screenshots
from taxonomy.qzv:

from bespoke_taxonomy.qzv:

Comments

This is under the assumption that retraining the classifier should improve identification results.

Answer 1 · 2020-09-24T16:15:38.000Z

@BenKaehler, did you write that part of the PD Mice tutorial? If so, care to comment?

Answer 2 · 2020-09-24T16:37:51.000Z

To clarify, what changed is that the new uniform (default) pre-trained classifier is using the RESCRIPt-processed greengenes database. The bespoke classifier is trained using the old (raw) greengenes: [image: image.png] So two things need to happen: 1. the bespoke classifier should be trained on the same data as the uniform classifier 2. the question needs to be changes to find another taxon that is underclassified by the uniform classifier

…

On Thu, Sep 24, 2020 at 6:15 PM Matthew Dillon ***@***.***> wrote: @BenKaehler <https://github.com/BenKaehler>, did you write that part of the PD Mice tutorial? If so, care to comment? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#490 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAORW3D2UECYY2DAL7XRRQDSHNWDVANCNFSM4RYLKVNA> .