qiime2/q2-fragment-insertion

ENH: "Preserve" original node names

Closed this issue · 3 comments

Improvement Description
"Preserve" original feature IDs by renaming with the rename-json.py output by SEPP.

Because SEPP renames nodes , the trees it produces don't play nice with downstream tools like Empress that can color trees using feature metadata.

Current Behavior
This tree cannot be easily colored by taxonomy, because the node IDs do not map to the original feature IDs.
image

Proposed Behavior
Use the rename-json.py script output by SEPP to "preserve" original feature IDs, probably by exposing a new parameter so as to not impact runtimes.

OK, the plot thickens. It looks like some of the nodes in this tree have preserved their original feature IDs. In this zoomed-in shot of the same tree, I've got a QIIME 2 feature-id, and the expected coloring and taxonomy information.

image

Not sure why the behavior is inconsistent, but I'll drop a note in here if I run into a good explanation.

Hi @ChrisKeefe, placed features should have preserved tip names. The reference tree will also contain the tips for the reference as well, for Greengenes this is another ~200k tips using GG identifiers. Would that explain what you're observing?

Absolutely does, thanks @wasade ! So the majority of the feature IDs are coming straight out of Silva, and the coloring in Empress is just getting overwhelmed by uncolored reference features not found in our study data.

Setting Empress's --p-shear-to-feature-metadata flag to true (the default IIRC) trims out the non-study nodes, and takes care of the issue nicely.
image