ABSA E2E

ABSA analysis on SemEval 2014 Task 4 and SemEval 2016 Task 5.

.
├── checkout
│   ├── data_processing_log.txt
│   ├── state_dict  //saved model
│   ├── test_log.txt
│   └── training_log.txt
├── config
│   └── config.py
├── data
│   ├── elmo  //elmo pretrained models
│   │   ├── elmo_2x4096_512_2048cnn_2xhighway_options.json
│   │   └── elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5
│   ├── glove  //glove pretrained embeddings
│   ├── Semeval2014
│   ├── processed //processed files
│   │   ├── Restaurants_dev_v2.csv
│   │   ├── Restaurants_test_v2.csv
│   │   └── Restaurants_Train_v2.csv
│   └── raw  //raw SemEval xml data file
│   │   ├── Laptops_Train.xml
│   │   ├── Laptop_Train_v2.xml
│   │   ├── Restaurants_Train_v2.xml
│   │   └── Restaurants_Train.xml
│   ├── Semeval2016
│   │   ├── processed
│   │   └── raw
│   └── stopwords.txt
├── models
│   ├── downstream.py  //Linear, LSTM, Self-Attention, CRF
│   └── pretrain_model.py  
├── README.md
├── requirements.txt
├── results    
│   ├── total_train_log.csv  // 194 training records
│   └── readme.md  // more results
├── test.py
├── train.py
├── train.sh
└── utils
    ├── data_utils.py
    ├── metrics.py
    ├── processer.py
    └── result_helper.py

Experiment

Main Results:

CE (Co-Extract) F1: Macro f1 for 4 classes during testing. (Not aspect, aspect-pos, aspect-neg, aspect-neu).

AE (Aspect Extract) F1: Macro f1 for 2 classes during testing. (Not aspect term, aspect term)

PC (Polarity Classify) F1: Macro f1 for 3 classes during testing. (aspect-pos, aspect-neg, aspect-neu)

BP (Broken Prediction): Number of predictions with inconsistent polarity for target aspect term. i.e. B-neg, I-pos, E-pos

Result for bert

The following model is trained and test on different dataset with split seed (6, 7, 8, 66, 77). Each model test on each dataset once. The final score is the mean of Macro F1 for 5 tests.

	lap 14	lap 14	lap 14	res 16	res 16	res 16	res 14	res 14	res 14
models	AE	PC	CE	AE	PC	CE	AE	PC	CE
bert-linear	87.60	70.14	64.80	85.30	67.11	62.34	89.49	72.04	68.13
bert-lstm	87.07	71.31	65.01	85.57	70.83	64.93	90.23	72.20	68.87
bert-san	87.08	69.57	63.94	85.09	67.15	61.88	90.01	74.46	70.12
bert-crf	87.80	69.97	65.07	85.73	69.12	64.20	89.97	72.82	68.72

More results or detail please referred to result folder

To Run

Step 1: Process raw data

cd utils
python processer.py --model_name "bert" --seed 6 --max_seq_len 128

--model_name : "bert", "elmo" or "glove"

--seed: selected random seed.

--split_ratio 0.8 0.1 0.1 : split ratio for train, dev, test set

Step 2: train Model (cd E2E_ABSA folder)

python train.py --mode "res14" --downstream "san" --model_name "bert" --seed 6

--mode : res14 , res16 or lap14. The SemEval task to train on.

--downstream : linear, lstm, crf, lstm-crf or san. The downstream model.

--model_name : "bert", "elmo" or "glove", same asstep 1.

--seed : seed for record training log, same as step 1.

some other default settings:

--lr 5e-5 --batch_size 32 --loss "focal" --gamma 2 --alpha 0.75 --max_seq_len 128 --optimizer "adamw" --warmup_steps 300 --max_steps 3000

training log path: ./checkout/training_log.txt

Step 3: Generate test result

python test.py --mode "res14" --downstream "san" --model_name "bert" --seed 6

testing log path: ./checkout/test_log.txt

关于本仓库的 colba demo（中文）

colab 链接

Reference

SemEval official

Pretrained ELMo File

kevinng77/E2E_ABSA