/EmoMap

Code and data associated with our LREC 2018 and COLING 2018 papers on converting between emotion formats

Primary LanguagePythonMIT LicenseMIT

Emotion Representation Mapping

DOI

This repository comprises code, experimental results and language resources associated with our LREC 2018 and COLING 2018 papers on converting between different emotion representation formats.

Introduction

Emotion lexicons (data sets which describe the emotions which are associated with individual words) are an important resource in sentiment analysis. However, there are many different ways how affective states can be described, for example in terms of Basic Emotions are Valence-Arousal-Dominance. Having many of these so-called emotion representation formats brings up problems regarding comparability and inter-operability of different kinds of language resources (data sets as well as software tools). In order to address these problems, we propose a simple yet effective technique to convert between different emotion formats so that, for example, an emotion lexicons which uses Basic Emotions can be translated into a Valence-Arousal-Dominance encoding, and the other way the round. We call this task emotion representation mapping. We evaluate our approach on a highly multilingual collection of data sets and find that it is about as reliable as human annotation. Based on these results we automatically create new emotion lexicons for a wide range of languages

Emotion Lexicons

The latest version of our automatically generated emotion lexicons cover a total of 13 languages. Most of them describe the words in terms of five basic emotions categories (joy, anger, sadness, fear and disgust) on a numerical 5-point scale. They complement existing emotion lexicons which only describes the respective words according to another emotion representation format (Valence-Arousal or Valence-Arousal-Domaninance). The size of the generated emotion lexicons ranges up to 13k words. Details of our acquisition methodology are given in our COLING 2018 paper. Our results show, that these data sets, although automatically constructed, are virtually as reliable as manually annotated data. The indivual lexicons are listed below:

Citation

If you use our emotion lexicons or our code base, please cite our papers:

  • Sven Buechel and Udo Hahn. 2018. Representation Mapping: A Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons. In LREC 2018 — Proceedings of the 11th International Conference on Language Resources and Evaluation, pages 184 – 191, Miyazaki, Japan, May 7 – 12, 2018.
@inproceedings{Buechel18lrec,
address = {Miyazaki, Japan, May 7--12, 2018},
title = {Representation Mapping: {A} Novel Approach to Generate High-Quality Multi-Lingual Emotion Lexicons},
booktitle = {{LREC} 2018 --- {Proceedings} of the 11th {International} {Conference} on {Language} {Resources} and {Evaluation}},
author = {Buechel, Sven and Hahn, Udo},
year = {2018},
pages = {184--191},
}
  • Sven Buechel and Udo Hahn. 2018. Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level. Accepted for COLING 2018.
@inproceedings{ Buechel18coling,
author={Buechel, Sven and Hahn, Udo},
year={2018},
title={Emotion Representation Mapping for Automatic Lexicon Construction (Mostly) Performs on Human Level},
booktitle={COLING 2018 --- Proceedings of the 27th International Conference on Computational Linguistics},
volume={1, technical papers},
address={Santa Fe, New-Mexico, USA, August 20--26, 2018}
}

Folder Structure

The two subfolders lrec18and coling18contain code, experimental results and our generated emotion lexicons for the respective paper. Both directories hold self-contained codebases which come with their own set-up instructions and environments.

Contact

I am happy to give additional information or get feedback about our work via email: sven.buechel@uni-jena.de