Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain

Overview

We release large-scale datasets of users’ comments in two languages, English and Korean, for aspect-level sentiment analysis in automotive domain. The datasets consist of 58,000+ commentaspect pairs, which are the largest compared to existing datasets. In addition, this work covers new language (i.e., Korean) along with English for aspect-level sentiment analysis. We build the datasets from automotive domain to enable users (e.g., marketers in automotive companies) to analyze the voice of customers on automobiles.

We also provide baseline performances for future work by evaluating recent models on the released datasets.

Data

We provide the data for research purpose only and the redistribution of the data is prohibited. Please contact us if you agree to the terms of use.

 Contact information: dm.hyun@postech.ac.kr

Pretrained Word Vectors

We also provide the word vectors trained with Word2Vec for each language.

English: Google Drive link

Korean: Google Drive link

Aspect-level sentiment classifiers

Refer to a repository here, which is based on PyTorch. Simply change the data in the repository with ours to check the performance.

Citation

If you use this repository for your work, please consider citing our paper:

 @inproceedings{hyun2020building,
  title={Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain},
  author={Hyun, Dongmin and Cho, Junsu and Yu, Hwanjo},
  booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
  pages={961--966},
  year={2020}
}