TranslatorSRI/Babel

Some entities are showing up in both proteins and chemicals

Opened this issue · 1 comments

In some cases we have incomplete cliques that are chemicals and proteins:

Note that this sometimes results in the same identifier showing up twice -- once as a chemical and once as a protein:

Given the structure of Babel and NodeNorm, the easiest way to fix this would be by creating a hidden conflation that combines chemicals and proteins that we understand to mean the same thing into a hybrid Protein-Chemical clique. So that reduces this problem to:

  • Where can we store this hidden conflation in our databases (we could create a new database, but that would be annoying)
  • How do we identify that a particular CHEBI and a particular Protein refer to the same thing?