Sentiment Analysis Toolkit

Sentiment Analysis Toolkit that uses BERT and has a few configurable options. Parts of the code borrowed from Gaurish. Using the Latvian BERT model trained by Rinalds fine-tuned on the Latvian Twitter Eater Corpus (LTEC) we were able to train a model with 74.33% percision on the LTEC evaluation set and 77.60% on the evaluation set from the Latvian Tweet Corpus.

Data Format

CSV format with a label that you want to predict and the text.
For example, to classify 3 sentiment classes you could use 0 - neutral; 1 - positive; 2 - negative like shown in this example:

label,text
1,"@maljorka Hehe, man tad labāk garšo bez nekā, nevis šādi. :D Ai, gaumes ir tik atšķirīgas."
0,@IngaStirna Ābolu šarlote.
2,Mēs ar viņu varējām sarunāties tikai caur logu un pusdienu vietā apēdām bulciņas mašīnā. Nav ok.

Usage

Fill in configuration details like training/development/evaluation files, paths to the BERT model and where you want to save sentiment classification models in config.py

Fine-tune BERT/mBERT (Optional)
- Run run_mlm.py from this repo on your own data.
- It may also be useful to find a pre-trained BERT model in your language of choice and use of fine-tune that.
Training
- Run train.py
Tuning (domain adaptation)
- Change the training data set in config.py to either only your in-domain data or perhaps a 1:1 mix of in-domain and out-of-domain data.
- Run train.py --tune
- You may want to lower the learning rate, change dropout or play with other parameters - use grid_search.sh to go over combinations.
Prediction
- Run predict.py --input input-file.csv --output output-file.csv
- The input file should be only texts - one per row. The output will be label, text.
Grid Search
- Iterate over several combinations of hyperparameters

Parameters

The following are parameters for train.py

Parameter	Description	Example Value	Default Value
--tune	Loads the model in `MODEL_PATH` for fine-tuning.
--lr	Learning rate.	0.00005	0.00001
--drop	Dropout.	0.1	0.3
--save	Save and evaluate after X examples.	3000	15000
--estop	Stop training after model has not improved X times.	10	5

The following are parameters for predict.py

Parameter	Description	Example Value	Default Value
--input	Input file for prediction - one text per line.	'in.csv'	`EVAL_PROC`
--output	Output file with predicted label and text.	'out.csv'	'predictions.csv'
--model_path	Model to use for prediction.	'best_model.bin'	`MODEL_PATH`

Publications

If you use this tool, please cite the following paper:

Maija Kāle, Matīss Rikters (2021). "Fragmented and Valuable: Following Sentiment Changes in Food Tweets." Smell, Taste, and Temperature Interfaces (2021).

@inproceedings{Kale-Rikters-2021STT,
	author = {Kāle, Maija and Rikters, Matīss},
	journal={Proceedings of Smell, Taste, and Temperature Interfaces Workshop},
	title = {{Fragmented and Valuable: Following Sentiment Changes in Food Tweets}},
	address={Yokohama, Japan},
	year = {2021}
}

M4t1ss/sentiment-analysis-toolkit

Sentiment Analysis Toolkit

Data Format

Usage

Fine-tune BERT/mBERT (Optional)

Training

Tuning (domain adaptation)

Prediction

Grid Search

Parameters

Publications