Thesis document for automatic prediction of Biomedical Metadata using scientfic plublications.
While there exists an abundance of open biomedical data, the lack of high quality metadata makes it challenging for others to nd relevant datasets and to reuse them for another purpose. In particular, metadata are useful to understand the nature and provenance of the data. A common approach to improving the quality of metadata relies on expensive human curation, which itself is time consuming and also prone to error. Towards improving the quality of metadata we use scientic publications to automatically predict metadata key: value. For prediction, we use a Convolutional Neural Network (CNN) and a Bidirectional Long-short term memory network (BiLSTM). Additionally, we also perform a comparison with multi-label classication methods. The best results we get are from CNN. We focus our attention on the NCBI Disease Corpus.