Generate and rank
Authors | Paper | idea | Dataset | Eval | Code |
---|---|---|---|---|---|
Yan et al. (2020) | A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization | multi-implication, generating and ranking, constraint decoder | CHIP-2019 | accuracy,按beamsize统计recall | |
Xu et al. (2020) | A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization | generate and rank (generate = select) | AskAPatient /TwADR-L /SMM4H-17 /MCN | accuracy | https://github.com/dongfang91/Generate-and-Rank-ConNorm |
Supervised multi-class classifiers
Authors | Paper | idea | Dataset | Eval | Code |
---|---|---|---|---|---|
Belousov(2017) | Using an Ensemble of Generalised Linear and Deep Learning Models in the SMM4H 2017 Medical Concept Normalisation Task | logistic | medDRA | Acc | |
Tutubalina(2018) | Medical concept normalization in social media posts with recurrent neural networks | RNN/LSTM/GRU + Similarity feature(TF-IDF/w2v) | CADEC | Acc | |
Luo et al.(2019) | A Hybrid Normalization Method for Medical Concepts in Clinical Narrative using Semantic Matching | "Exact Match + Edit Distance Deep Learning(Embedding layer+bilstm+dense+softmax) | ShARe/CLEF 2013 | Acc+Macro+Micro | |
Niu et al.(2019) | Multi-task CharacterLevel Attentional Networks for Medical Concept Normalization | medical named entity recognition(CNN embedding+BiLSTM sequence labeling) and normalization | BC5CDR task corpus/ NCBI Disease corpus | F1 |
Rank
Method | Paper | Author | Disadvantage |
---|---|---|---|
Direct Rank(字典查找、字符串匹配) | 不能找到字面上不相似但语义相同的concept | ||
Direct Rank(Classfication) | 输出空间太大 | ||
Point-wise Learing to Rank | KB规模大时效率不佳 | ||
Pair-wise learning to rank | A combined recall and rank framework with online negative sampling for Chinese procedure terminology normalization(https://github.com/sxthunder/CMTN) | Liang et al.(2021 Oxford) | 结合了mention和concept的embedding相似度、诊断部位和诊断类型的attention相似度参与rank |
List-wise learining to rank | A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization | Yan et al. (2020) | multi-implication未解决, rank本质上还是按list送进bert+分类层? |
Dataset
Dataset | Paper | Mention | Standard entities | Source | Link |
---|---|---|---|---|---|
CHIP-2019 | A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization(EMNLP2020) | 4000 | 9867 | clinical procedures extracted from Chinese electronic medical records | |
Ask patient | Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation(ACL2016) | 17324 | 1036 concepts, 22 semantic types | blog post | https://github.com/dongfang91/Generate-and-Rank-ConNorm/tree/master/Generator/Multiclass/data/askapatient https://zenodo.org/record/55013#.YOVHKPkzZaQ |
TwADR-L | Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation(ACL2016) | 5074 | 2220 concepts, 18semantic types | social media | https://github.com/unt-iialab/medical-concept-normalization https://zenodo.org/record/55013#.YOVHKPkzZaQ |
SMM4H-17 | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task(JAMIA 2018) | 9149 | 22500 concepts(513 in trainset), 61 semantic types | tweets | https://metatext.io/datasets/social-media-mining-for-health-(smm4h) |
MCN | Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task(JAMIA 2018) | 13609 | 434056 concepts(3792 in trainset), 125 semantic types | from the MIMIC II31 database(https://www.sciencedirect.com/science/article/pii/S1532046419300504) | |
ShARe/CLEF 2013 | 199 notes in the training set and 99 notes in the test set | https://sites.google.com/site/shareclefehealth/data | |||
BC5CDR task corpus | 1500 PubMed abstracts | MeSH/OMIM concepts(according to e Comparative Toxicogenomics Database (CTD) MEDIC disease vocabulary) 9700 concepts, 67000terms(including synonyms) | https://paperswithcode.com/dataset/bc5cdr https://www.researchgate.net/figure/The-overall-corpus-statistics_tbl2_302593945 | ||
NCBI Disease corpus | 793 PubMed abstracts | https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/ | |||
CADEC | mapped to SNOMED CT-AU | https://researchdata.edu.au/cadec/494923 |