SentimentMT

Summary

This is the repo associated with the paper Sentiment-based Candidate Selection for NMT, co-written by me (Alex Jones) and my supervisor Derry Wijaya. The paper describes a decoder-side approach for selecting the translation candidate that best preserves the automatically-socred sentiment of the source text. To this end, we train three distinct sentiment classifiers: an English BERT model, a Spanish XLM-RoBERTa model, and an XLM-RoBERTa model fine-tuned on English but used for sentiment classification in other languages, such as French, Finnish, and Indonesian. We compute a softmax over the logits returned by these classifiers to obtain the probability of a text being in the positive class, and call this number the "sentiment score":

$S(t) = P(c=1 \mid t)$

We then generate translation candidates using beam search and select the candidate whose sentiment score differs least from that of :

$y = argmin_{x \in X}|S(x) - S(t)|, |X|=n$

We conduct human evaluations on English-Spanish and English-Indonesian translations with proficient bilingual speakers and report the results in our paper. We also provide examples of tweets translated using this method in the Discussion and the Appendix.

Dependencies

PyTorch
Transformers
Scikit-learn
SciPy
BeautifulSoup (for text preprocessing)
Numpy
Pandas

Sentiment Classification

We construct sentiment classifiers by fine-tuning on labeled sentiment data in English and Spanish separately. The English-only sentiment classifier is constructed using BERT; the notebook for training is available here and is based on the BERT fine-tuning tutorial by Chris McCormick and Nick Ryan (as are all the notebooks we used for training our sentiment classifiers—citations are provided in-notebook). We also fine-tune XLM-RoBERTa using annotated Spanish data, and then again using the English sentiment data. The sentiment models themselves (the PyTorch files containing the parameters) are available here, and the annotated sentiment data is available at the following links:

We perform machine translation using the open-source Helsinki-NLP/OPUS-MT models, which offers pretrained models for easy usage here. We opted for this system because we were easily able to generate n-best lists and incorporate sentiment-based selection into the decoding step. Because we used pretrained models, we don't perform any of our own training, but these notebooks show how we integrate sentiment scoring into the translation selection process. Another advantage of the Helsinki-NLP models was the wide variety of supported languages, which we wielded to our advantage in trying our approach on many different languages (see the Appendix of our paper for concrete examples).

Experimental Materials

In human evaluations of the translations, we asked participants to grade translations based on both their accuracy (broadly speaking) and their level of sentiment divergence, and also asked them to provide reasons why they thought the sentiment of the source text differed from that of the translation, if applicable. We performed both an English-Spanish and English-Indonesian evaluation. See the following files for reference:

The translations that were evaluated
Source texts (English tweets) deemed to be particularly "idiomatic"
The evaluation templates themselves
The notebooks we used in analyzing the results of the human evaluations

License

BSD 3-clause.

Citation

Please cite our paper if you use any of the resources in this repo for your research:

@inproceedings{jones-wijaya-2021-sentiment,
    title = "Sentiment-based Candidate Selection for {NMT}",
    author = "Jones, Alexander G  and
      Wijaya, Derry",
    booktitle = "Proceedings of the 18th Biennial Machine Translation Summit (Volume 1: Research Track)",
    month = aug,
    year = "2021",
    address = "Virtual",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://aclanthology.org/2021.mtsummit-research.16",
    pages = "188--201"}

AlexJonesNLP/SentimentMT

SentimentMT