IEDB/arborist

Fix problem with strains in current protein tree

Closed this issue · 2 comments

Randi found that Influenza A (11320) has a bunch of "STRAIN protein" nodes, which should not be there. She also found that the proteins she did expect, e.g. Hemagglutinin, result in far fewer epitopes than she was expecting. So I guess that the strain proteins are grouping sources that should be grouped under Hemagglutinin.

Example (inside the VPN): https://arborist-dev.lji.org/arborist/molecule_tree_old%20molecule_tree/iedb-protein:11320

I think I broke this, so I'll reassign it to myself.

The strains in the tree should be fixed by 117316f, but that doesn't explain why Hemaglutinin doesn't have enough epitopes...