The CORD-19 dataset is a vast collection of literature on the novel coronavirus. We can apply text and data mining approaches to find answers to questions in the literature in support of the ongoing COVID-19 response efforts worldwide.
- Smoking, pre-existing pulmonary disease
- Co-infections (determine whether co-existing respiratory/viral infections make the virus more transmissible or virulent) and other co-morbidities
- Neonates and pregnant women
- Socio-economic and behavioral factors to understand the economic impact of the virus and whether there were differences.
- Transmission dynamics of the virus, including the basic reproductive number, incubation period, serial interval, modes of transmission and environmental factors
- Severity of disease, including risk of fatality among symptomatic hospitalized patients, and high-risk patient groups
- Susceptibility of populations
- Public health mitigation measures that could be effective for control
First, the documents on COVID-19 are retrieved using a BM-25 search engine. Then, to find answers to the questions above, two methods are used to find sentences in the papers that talk about those topics.
Method 1:
- Create TF-IDF vectors for all sentences from all papers
- For a particular Search Query, get the TF-IDF vector.
- Find the highest Cosine Similarity between the Search Query and all the sentences from the papers.
- Pros: Fast and accurate.
- Cons: Not able to capture semantic relationships between words.
Method 2:
- Train Word Embeddings (Word2Vec) on the papers' texts.
- For a particular Search Query, get the embedded Word Vectors.
- Find the lowest Word Mover's Distance between the Search Query and all the sentences from the papers.
- Pros: Able to capture semantic relationships between words.
- Cons: Distance calculations are slow.
Question: Incubation Period - TF-IDF
Question: Incubation Period - WMD
Question: Co-morbidities - TF-IDF
Question: Co-morbidities - WMD
Question: High Risk Group - TF-IDF Question: High Risk Group - WMD
Question: Reproductive Number - TF-IDF Question: Reproductive Number - WMD
Question: Pregant Women - TF-IDF Question: Pregnant Women - WMD
Question: Neonates of Mothers with Covid-19 - TF-IDF Question: Neonates of Mothers with Covid-19 - WMD