/PASTE

Codes and Datasets for our EMNLP 2021 (main conference) Long Paper titled "PASTE: A Tagging-Free Decoding Framework Using Pointer Networks for Aspect Sentiment Triplet Extraction"

Primary LanguagePython

PASTE - Pointer Network-based decoding framework for Aspect Sentiment Triplet Extraction

  • For reporting the results for our model variants in the paper, we select the best models according to the best F1 score on the development data and use them to evaluate on the test data.
  • We run each model five times and report the median scores.
  • All our experiments are run on Tesla P100-PCIE (16GB) GPU.

Code to convert ASTE-Data-V2 data from the format defined by the authors of JET to our format:

python create_data.py
  • Pre-processed data in our format are already provided under respective folders.

Code to prepare BERT-specific data in accordance with our position-based triplet representation scheme:

python prep_BERTData.py

Code to align POS and DEP tags with the wordpiece tokens generated by BERT-tokenizer:

python prep_POS_DEP_forBERT.py
  • Pre-processed BERT-specific data files are already provided under respective folders.

PASTE - Without BERT

Code: PASTE.py

Sample command to run PASTE-AF

python3 PASTE.py 0 1023 lap14/ lap14/PASTE train 10 100 100 AF AP

Sample command to run PASTE-OF

python3 PASTE.py 0 1023 lap14/ lap14/PASTE train 10 100 100 OF OP

Script used to run PASTE-AF for all datasets over 5 different seed values (Similar script was used for PASTE-OF)

sh PASTE_script.sh

Hyper-parameter Configurations:

Optimizer: Adam
Learning rate: 0.001
Coefficient of L2 Regularization: 0.00001
Dropout: 0.5
No. of epochs: 100
Batch Size: 10
No. of trainable parameters: ~ 6 mil.
Training time per epoch: 12 seconds

PASTE - With BERT

Code: PASTE_BERT.py

Sample command to run PASTE-AF (BERT)

python3 PASTE_BERT.py --gpu_id 0 --src_folder lap14/ --trg_folder lap14/PASTE_BERT --bert_mode gen --gen_direct af --l2 y

Sample command to run PASTE-OF (BERT)

python3 PASTE_BERT.py --gpu_id 0 --src_folder lap14/ --trg_folder lap14/PASTE_BERT --bert_mode gen --gen_direct of --l2 y

Script used to run PASTE-AF (BERT) on 14Lap (Similar scripts are provided for each dataset)

sh run_script_lap.sh

Hyper-parameter Configurations:

Optimizer: Adam
Learning rate: 0.001
We tried two values for Coefficient of L2 Regularization: 0.0001 and 0.00001
Dropout: 0.5
No. of epochs: 30 for individual datasets, 50 for resall.
Batch Size: 16
No. of trainable parameters: ~ 128 mil.
Training time per epoch: 40 seconds

Code to obtain the prediction performance for each element of an opinion triplet:

python error_analysis.py 
  • This code requires the file 'test.out' which is generated in the target folders when the codes for PASTE or PASTE-BERT are executed.