IEDB/arborist

Develop workflow for reviewing and updating NCBI Taxonomy

Opened this issue · 0 comments

Just some rough notes for now:

When the NCBI Taxonomy updates, we worry that:

  1. the changes break our upper hierarchy
  2. an existing species is merged into another node
  3. an existing species is "demoted" to another rank

Our list of proteomes must reflect all the species in our organism tree. If we end up with new species, we need to assign proteomes to them. Proteome selection is somewhat expensive, but we are unlikely to be adding many species in a month, and they are more likely to be species with small proteomes than large ones.

I want to automate the review and proteome update process as much as possible.