Susiewest/Medical-Concept-Normalization-Survey

Medical-Concept-Normalization-Survey

Generate and rank

Authors	Paper	idea	Dataset	Eval	Code
Yan et al. (2020)	A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization	multi-implication, generating and ranking, constraint decoder	CHIP-2019	accuracy，按beamsize统计recall
Xu et al. (2020)	A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization	generate and rank (generate = select)	AskAPatient /TwADR-L /SMM4H-17 /MCN	accuracy	https://github.com/dongfang91/Generate-and-Rank-ConNorm

Supervised multi-class classifiers

Authors	Paper	idea	Dataset	Eval	Code
Belousov(2017)	Using an Ensemble of Generalised Linear and Deep Learning Models in the SMM4H 2017 Medical Concept Normalisation Task	logistic	medDRA	Acc
Tutubalina(2018)	Medical concept normalization in social media posts with recurrent neural networks	RNN/LSTM/GRU + Similarity feature（TF-IDF/w2v）	CADEC	Acc
Luo et al.(2019)	A Hybrid Normalization Method for Medical Concepts in Clinical Narrative using Semantic Matching	"Exact Match + Edit Distance Deep Learning（Embedding layer+bilstm+dense+softmax）	ShARe/CLEF 2013	Acc+Macro+Micro
Niu et al.(2019)	Multi-task CharacterLevel Attentional Networks for Medical Concept Normalization	medical named entity recognition(CNN embedding+BiLSTM sequence labeling) and normalization	BC5CDR task corpus/ NCBI Disease corpus	F1

Rank

Method	Paper	Author	Disadvantage
Direct Rank（字典查找、字符串匹配）			不能找到字面上不相似但语义相同的concept
Direct Rank（Classfication）			输出空间太大
Point-wise Learing to Rank			KB规模大时效率不佳
Pair-wise learning to rank	A combined recall and rank framework with online negative sampling for Chinese procedure terminology normalization（https://github.com/sxthunder/CMTN）	Liang et al.(2021 Oxford)	结合了mention和concept的embedding相似度、诊断部位和诊断类型的attention相似度参与rank
List-wise learining to rank	A Generate-and-Rank Framework with Semantic Type Regularization for Biomedical Concept Normalization	Yan et al. (2020)	multi-implication未解决, rank本质上还是按list送进bert+分类层？

Dataset

Dataset	Paper	Mention	Standard entities	Source	Link
CHIP-2019	A Knowledge-driven Generative Model for Multi-implication Chinese Medical Procedure Entity Normalization（EMNLP2020）	4000	9867	clinical procedures extracted from Chinese electronic medical records
Ask patient	Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation（ACL2016）	17324	1036 concepts, 22 semantic types	blog post	https://github.com/dongfang91/Generate-and-Rank-ConNorm/tree/master/Generator/Multiclass/data/askapatient https://zenodo.org/record/55013#.YOVHKPkzZaQ
TwADR-L	Normalising Medical Concepts in Social Media Texts by Learning Semantic Representation（ACL2016）	5074	2220 concepts, 18semantic types	social media	https://github.com/unt-iialab/medical-concept-normalization https://zenodo.org/record/55013#.YOVHKPkzZaQ
SMM4H-17	Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task（JAMIA 2018）	9149	22500 concepts(513 in trainset), 61 semantic types	tweets	https://metatext.io/datasets/social-media-mining-for-health-(smm4h)
MCN	Data and systems for medication-related text classification and concept normalization from Twitter: insights from the Social Media Mining for Health (SMM4H)-2017 shared task（JAMIA 2018）	13609	434056 concepts(3792 in trainset), 125 semantic types	from the MIMIC II31 database（https://www.sciencedirect.com/science/article/pii/S1532046419300504）
ShARe/CLEF 2013		199 notes in the training set and 99 notes in the test set			https://sites.google.com/site/shareclefehealth/data
BC5CDR task corpus		1500 PubMed abstracts	MeSH/OMIM concepts(according to e Comparative Toxicogenomics Database (CTD) MEDIC disease vocabulary) 9700 concepts, 67000terms(including synonyms)		https://paperswithcode.com/dataset/bc5cdr https://www.researchgate.net/figure/The-overall-corpus-statistics_tbl2_302593945
NCBI Disease corpus		793 PubMed abstracts			https://www.ncbi.nlm.nih.gov/CBBresearch/Dogan/DISEASE/
CADEC			mapped to SNOMED CT-AU		https://researchdata.edu.au/cadec/494923