dmis-lab/BioSyn

Input/Output clarification

amirj opened this issue · 7 comments

amirj commented

If I understood correctly, we have a dictionary which maps "aliases"/"synonyms" to a list of corresponding cuis.
The input of the algorithm is a string (mention) and the output is some items in the dictionary:

{
  "mention": "ataxia telangiectasia", 
  "predictions": [
    {"name": "ataxia telangiectasia", "id": "D001260|208900"}, 
    {"name": "ataxia telangiectasia syndrome", "id": "D001260|208900"}, 
    {"name": "ataxia telangiectasia variant", "id": "C566865"}, 
    {"name": "syndrome ataxia telangiectasia", "id": "D001260|208900"}, 
    {"name": "telangiectasia", "id": "D013684"}
  ]
}

Since our target is to map the mention to a CUI, I'm wondering if there is any functionality that map the above output to a single CUI?

Hi amirj

The predictions are sorted in the order of the final scores.
This means that the first item of the predictions is the top 1 prediction, and the CUI of it is the final single CUI for the given input mention.

amirj commented

Thanks Mujeen. What would happen if the top prediction is corresponding to multiple CUIs? --what's the best practices to extract the target CUI among them?

When a single prediction has multiple CUIs, the CUIs are from multiple KBs.
For example, in 'D001260|208900', 'D001260' is from MeSH and '208900' is from OMIM.

For evaluation, I consider it correct when any of the predicted CUIs is matched with the golden answer.
But, in practice, you can choose any CUI depending on the KB you are using.

amirj commented

Thanks for your clarification. In my use case, I'm leveraging UMLS "aliases/synonyms" in my dictionary. As a result, an ambiguous synonym would be mapped to more than one CUI.
What's your suggestion in this situation?

I think it depends on how you use it.
Is there a reason that you want to extract just one CUI?

amirj commented

Yes, I want to directly map a mention to only one entity, i.e. entity linking

Hi amirj

Thank you for your patience.

That's a good point. But in our work, we focus on handling term variations of the biomedical concepts rather than disambiguating mentions.