kaggle-covid19-literature

Requirements

pip install tqdm boto3 requests regex sentencepiece sacremoses
pip install transformers
pip install -U git+https://github.com/dgunning/cord19.git
pip install langdetect
pip install pandas
pip install tqdm 
pip install snorkel

Overview

  1. Raw data
./raw-data # the literature files provided from Kaggle
  1. Pseudo labelling with keywords and Snorkel (Kanglin, Shana, Qian)
./snorkel-pseudo-label/Snorkel_pseudo_label.ipynb* #code for the pseudo label using Snorkel. 
  1. Retrieve relevant sentences (Kanglin, Yejin)
./sentence-classification/bert_classification.ipynb #code for sentence classification to retreive relevant sentences to the question
./sentence-classification/evaluation_BERT.ipynb #code for re-ranking sentences based on keywords and reference sentence similarity
  1. Question-answering (Kejing, Tongtong, Yan)
./question-answering/*code.ipynb* #code for question answering

./question-answering/*sentence.pkl* #retrived sentence for each question. pd.Dataframe with columns=['qid', 'sentence_sha', 'rank']
  1. Visualization (Yimeng, Kendall)
./human-correction/*code.ipynb* #qgrid visualization code

10 questions are listed in here

Shared file folder is here