Some entities are showing up in both proteins and chemicals
Opened this issue · 1 comments
gaurav commented
In some cases we have incomplete cliques that are chemicals and proteins:
- #200
- #232
- NCATSTranslator/Feedback#150
- NCATSTranslator/Feedback#576
- NCATSTranslator/Feedback#613
- NCATSTranslator/Feedback#623
- NCATSTranslator/Feedback#743 (comment)
Note that this sometimes results in the same identifier showing up twice -- once as a chemical and once as a protein:
gaurav commented
Given the structure of Babel and NodeNorm, the easiest way to fix this would be by creating a hidden conflation that combines chemicals and proteins that we understand to mean the same thing into a hybrid Protein-Chemical clique. So that reduces this problem to:
- Where can we store this hidden conflation in our databases (we could create a new database, but that would be annoying)
- How do we identify that a particular CHEBI and a particular Protein refer to the same thing?