First, install the required packages
pip install -r requirements.txt
Then download the word2vec wikipedia-pubmed pretrained embeddings:
./download_embeddings.sh
Then, to pretrain on the pubmed20k dataset (see pubmed_text_classification/scripts/train_and_evaluate.py for a number of command-line arguments):
cd pubmed_text_classification/scripts
python train_and_evaluate.py
First download the embeddings, and put them in your Google Drive in a folder called 'pretrained_embeddings'. Then the following code should work:
# mount google drive
from google.colab import drive
drive.mount('/content/gdrive')
# clone the repo
!git clone https://github.com/cjs220/pubmed_text_classification.git
# pretrain on pubmed20k
import os
os.chdir('pubmed_text_classification/pubmed_text_classification/scripts')
!python train_and_evaluate.py --pretrained_embeddings '/content/gdrive/My Drive/pretrained_embeddings/wikipedia-pubmed-and-PMC-w2v.bin'