SE-T8: A Python repository from googolplexus

Overview

Coming Soon

arxiv link - coming soon.

Dependencies and set-up

Dependency	Version	Installation Command
Python	3.7	`conda create --name measeval python=3.7` and `conda activate measeval`
PyTorch, cudatoolkit	1.4.0, 10.1	`conda install pytorch==1.4.0 cudatoolkit=10.1 -c pytorch`
Transformers (Huggingface)	4.2.2	`pip install transformers==4.2.2`
Scikit-learn	0.23.2	`pip install scikit-learn==0.23.2`
scipy	1.5.0	`pip install scipy==1.5.0`
nltk	0.5.1	`pip install nltk`
pandas	1.2.2	`pip install pandas`
pandasql	0.7.3	`pip install pandasql`
vladiate	0.0.23	`pip install vladiate`
numpy	1.18.1	-

Instructions

Setting up the Code-base

download this repository and name the folder as 'measeval'
cd measeval
Clone the dataset and other utils - git clone https://github.com/harperco/MeasEval
Set up the dependencies:
- conda create --name measeval python=3.8
- conda activate measeval
- conda install pytorch==1.4.0 cudatoolkit=10.1 -c pytorch
- pip install transformers==4.2.2
- pip install scikit-learn==0.23.2
- pip install scipy==1.5.0
- pip install nltk
- pip install pandas
- pip install pandasql
- pip install vladiate
Set the paths: change the lines 10, 328 in task2/label.py, 15, 217 in task1/train.py, 16, 287 in task3/train.py and 6 in combine_3.py to set your basepaths.

Part1:

python3 task1/train.py --batch=16 --lr=3e-5 --n_epochs=5 --device=cuda --bert="bert-base-cased"

You may change the following parameters:

Batch-size: --batch
Number of Epochs: --n_epochs
Learning rate: --lr
Bert-type: --bert
Device: --device cuda or cpu

Output Folder: task1/output/$(bert-type)

model.pt: Model weights
params.json: Hyperparams
test_spans.json: Test set quantity span predicts
train_spans.json: Train set quantity span predicts
trial_spans.json: Trial set quantity span predicts

Part2:

python3 task2/label.py --quant_path="task1/output/bert-base-cased/"

Input: output from part 1

You may change the following parameters:

--quant_path: path to your output folder from part1

Output Folder: task2/output/$(bert-type)

test_labels.json: Test set quantity span, unit and modifier predictions
train_labels.json: Train set quantity span, unit and modifier predictions
trial_labels.json: Trial set quantity span, unit and modifier predictions

Part3:

python3 task3/train.py --batch=16 --lr=3e-5 --n_epochs=10 --device=cuda --bert="bert-base-cased" --me --mp --qual --test_label_folder="task2/output/bert-base-cased"

Input: output from part 2

You may change the following parameters:

Batch-size: --batch
Number of Epochs: --n_epochs
Learning rate: --lr
Bert-type: --bert
Device: --device cuda or cpu
Multi-task learning: if you want to do ME, then include --me flag; if you want to do MP, then include --mp flag; if you want to do Qual, then include --qual flag. You can use any combination of these with at least one of these three flags provided.
--test_label_folder: path to your output folder from part2

Output Folder: task3/output/$(bert-type)

model.pt: Model weights
params.json: Hyperparams
test_spans_me.json: Test set quantity span predicts me
train_spans_me.json: Train set quantity span predicts me
trial_spans_me.json: Trial set quantity span predicts me
test_spans_mp.json: Test set quantity span predicts mp
train_spans_mp.json: Train set quantity span predicts mp
trial_spans_mp.json: Trial set quantity span predicts mp
test_spans_qual.json: Test set quantity span predicts qual
train_spans_qual.json: Train set quantity span predicts qual
trial_spans_qual.json: Trial set quantity span predicts me

Combine and Bring to desired format:

Combine: python3 combine_3.py --me task3/output/bert-base-cased/test_spans_me.json --mp task3/output/bert-base-cased/test_spans_mp.json --qual task3/output/bert-base-cased/test_spans_qual.json --t2_path task2/output/bert-base-cased/test_labels.json

You may change the following parameters:

ME testset prediction path: --me
MP testset prediction path: --mp
Qual testset prediction path: --qual
Task2 testset prediction path: --t2

Output - final_labels.json

Convert json to tsv: python3 json_to_tsv.py

Output - predictions inside tsv folder

Run their evaluation script: python measeval-eval.py -i ./ -s tsv/ -g MeasEval/data/eval/tsv/ -m class

python measeval-eval.py -i ./ -s tsv/ -g MeasEval/data/eval/tsv/

Output isn't stored, only printed.

Trained Models

Model	Part1	Part3
Biomed Roberta	link	link
Bert large	link	link
Bert base	link	link
SciBert	link	link
BioBert	link	link

Performance

Model	Quant	ME	MP	Qual	Unit	Mod	HQ	HP	Overall
BERT-base	0.828	0.338	0.277	0.072	0.765	0.465	0.310	0.174	0.402
BERT-large	0.705	0.343	0.296	0.081	0.755	0.442	0.325	0.207	0.392
RoBERTa-BioMed	0.812	0.384	0.365	0.104	0.804	0.434	0.383	0.238	0.440
SciBERT	0.809	0.382	0.324	0.072	0.811	0.435	0.354	0.230	0.433
BioBERT	0.844	0.407	0.365	0.111	0.796	0.465	0.400	0.269	0.456

Miscellanous

You may contact us by opening an issue on this repo. Please allow 2-3 days of time to address the issue.
The evaluation script was borrowed from this baseline here
License: MIT

googolplexus/SE-T8

Overview

Dependencies and set-up

Instructions

Setting up the Code-base

Part1:

Part2:

Part3:

Combine and Bring to desired format:

Trained Models

Performance

Miscellanous