This is the code repository accompanying the paper to appear at INLG2021: < Goal-Oriented Script Construction >.
-
environment.yml
specifies the conda environment needed running the code. You can create the environment using it according to this guildeline. -
When installing transformers, make sure you install it from source and put it under the root directory of this repo. This is because we need the scripts under
transformers/examples/
. Also, please replace therun_glue.py
filetransformers/examples/
by oursource/run_glue.py
. We modified it to allow the output of prediction probability scores.
The entire corpus is available here.
To run the experiments following the instructions in the repo, you can use the sample data here, and put it under data_dir/
.
The pretrained models can be downloaded from here, and put under output_dir/
.
-
data_dir/
: You should downloaddata.zip
and put its subfolders here.-
script_splits/
: Thescript_en.json
file contains an English sample of the wikiHow English script corpus, split into train and test. The file is a json consisting of two keys,"train"
and"test"
. Each split is a list of articles. See more details in the accompanying README. -
subtasks/
: This is the sample data for the two subtasks of the retrieval-based pipeline.train.tsv
contains the training data, anddev.tsv
contains the evaluation data. Note that this is for demonstration purposes only. If you want to reproduce the results in our paper, you need to download the full corpus above ("Multilingual wikiHow Script Corpus"). If you only want to construct custom scripts using our pretrained models, you can refer to the dev files and format your data accordingly. The dev files are for one example target script, "Dress Effectively".-
step_en/
: This is the data for the Step Inference task. The data format is[Index]\t[Goal]\t[Candidate step]\t[Label]
. Label=1 means the candidate step is a step of the given goal, 0 means otherwise. -
order_en/
: This is the data for the Step Ordering task.
The data format is[Index]\t[Goal]? [Step A]\t[Goal]? [Step B]\t[Label]
(empirically the best design choice). Label=0 means Step A precedes B, 0 means otherwise. Note that the step candidates in our sampledev.tsv
are gold steps, so that the Step Ordering module can be evaluated independently. If you want to run the entire retrieval-based pipeline, then you should take the output of the Step Inference task and format the top L (=script length) retrieved step candidates in this way, as input to the Ordering module.
-
-
-
output_dir/
: This is the output directory where models and predictions are stored. You should place the subfolders (but not themodels/
folder itself) of the downloadedmodel.zip
under it. It should look like this:step_en_mbert/
order_en_mbert/
-
source/
: The source code.finetune.py
: The code to finetune and evaluate a model on one subtask.eval_construction.py
: The code to construct final scripts from predictions of the two subtasks, and evaluate the entire pipeline on the GOSC task.run_glue.py
: The script that should be placed under your installedtransformers/examples/
directory.
-
transformers/
: The transformers package you are going to install from source. -
environment.yml
: The conda environment config file.
Please follow instructions in this colab notebook.
If you want to finetune the pipeline yourself, please start from step A. If you want to directly do inference with our pretrained pipeline, please start from step B.
-
Prepare your data according to the sample format (See
Repo Structure and File Format
->subtasks/
). Put thetrain.tsv
anddev.tsv
files underdata_dir/{subtask_name}/
. -
Specify your own paths at the beginning of
finetune.py
. -
Go to
source/
, and run
python finetune.py --mode train_eval --model [model_name] --max_seq_length [max_seq_length] --target [subtask_name] --t_bsize [t_bsize] --e_bsize [e_bsize] --lr [lr] --epochs [epochs] --logstep [logstep] --save_steps [savestep] --cuda [cuda]
Example:
python finetune.py --mode train_eval --model mbert --max_seq_length 64 --target step_en --t_bsize 32 --e_bsize 128 --lr 1e-5 --epochs 10 --logstep 40000 --save_steps 40000 --cuda 6
Details on the arguments are in finetune.py
.
If you'd like to finetune a model from scratch, set the --model
argument as mbert
, xlm-roberta
, etc.
If you'd like to finetune pretrained models, set it as the name of the model directory under output_dir
, e.g. step_en_mbert
. Note that you shouldn't include output_dir/
in the argument.
- The model output will be in
output_dir/{subtask_name}_{model_name}
, e.g.output_dir/step_en_mbert
. It will contain the trained model (pytorch_model.bin
) and its predictions on the dev set (model_pred.csv
).
If you only want to evaluate models on the two subtasks (Step Inference, Step Ordering) independently, then you can do the following steps for both in parallel. If you want to use the entire retrieval-based pipeline to construct scripts, then you should do the following steps for Step Inference first, and then use its output as the input to the Step Ordering subtask.
-
If you haven't done A, prepare your evaluation data according to the sample format (See
Repo Structure and File Format
->subtasks/
). Put thedev.tsv
file underdata_dir/{subtask_name}/
. -
Put the model you want to evaluate under
output_dir/{model_name}
. If you started from A, the models should already be there. Otherwise, you should download our pretrained models underGet started
, and put them underoutput_dir
. -
Specify your own paths at the beginning of
finetune.py
. -
Go to
source/
, and run
python finetune.py --mode eval --model [model_name] --max_seq_length [max_seq_length] --target [subtask_name] --e_bsize [e_bsize] --cuda [cuda]
Example:
python finetune.py --mode eval --model step_en_mbert --max_seq_length 64 --target step_en --e_bsize 128 --cuda 6
- The model output will be in
output_dir/{model_name}
, i.e. the model directory you have initially, e.g.output_dir/step_en_mbert
. It will contain the model's predictions on the dev set (model_pred.csv
) and the evaluation results (eval_results.txt
; note that this isn't final evaluation results on the GOSC task).
Using the model output from B, evaluation and generation can be done with
python eval_contruction.py --lang [language] --model [model_name] --task [step|order|combined|everything] (optional) --print
If --task
is set to step, recall and gain are measured; if order, Kendall's Tau by ordering the gold script is measured; if combined, recall, gain and Tau of the generated script is measured; if everything, all of the above are measured. If --print
is specified, the constructed scritp will be directed to standard output.
Example:
python eval_contruction.py --lang en --model mbert --task combined --print
Distributed under the MIT License.