Retrofitting-Word-Vectors-of-MeSH-Terms-to-Improve-Semantic-Similarity-Measures

Zhiguo Yu, MS, Trevor Cohen, MBChB, MD, PhD, Todd R. Johnson, PhD, PhD, Elmer Bernstam, MD, MSE

The University of Texas School of Biomedical Informatics at Houston, Houston, TX

Byron C. Wallace

University of Texas at Austin School of Information, Austin, TX

Estimation of the semantic relatedness between biomedical concepts has utility for many informatics applications. Automated methods fall into two broad categories: methods based on distributional statistics drawn from text corpora, and methods based on the structure of existing knowledge resources. In the former case, taxonomic structure is disregarded. In the latter, semantically relevant empirical information is not considered.

In this work, we present a method that retrofits the context vector representation of MeSH terms by using additional linkage information from UMLS/MeSH hierarchy such that linked concepts have similar vector representations. We evaluated the method relative to previously published physician and coder’s ratings on sets of MeSH terms. Our experimental results demonstrate that the retrofitted word vector measures obtain a higher correlation with physician judgments. The results also demonstrate a clear improvement on the correlation with experts’ ratings from the retrofitted vector representation in comparison to the vector representation without retrofitting.