Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain
Official data of COLING'20 paper
We release large-scale datasets of users’ comments in two languages, English and Korean, for aspect-level sentiment analysis in automotive domain. The datasets consist of 58,000+ commentaspect pairs, which are the largest compared to existing datasets. In addition, this work covers new language (i.e., Korean) along with English for aspect-level sentiment analysis. We build the datasets from automotive domain to enable users (e.g., marketers in automotive companies) to analyze the voice of customers on automobiles.
We also provide baseline performances for future work by evaluating recent models on the released datasets.
We provide the data for research purpose only and the redistribution of the data is prohibited. Please contact us if you agree to the terms of use.
Contact information: dm.hyun@postech.ac.kr
We also provide the word vectors trained with Word2Vec for each language.
English: Google Drive link
Korean: Google Drive link
Refer to a repository here, which is based on PyTorch. Simply change the data in the repository with ours to check the performance.
If you use this repository for your work, please consider citing our paper:
@inproceedings{hyun2020building,
title={Building Large-Scale English and Korean Datasets for Aspect-Level Sentiment Analysis in Automotive Domain},
author={Hyun, Dongmin and Cho, Junsu and Yu, Hwanjo},
booktitle={Proceedings of the 28th International Conference on Computational Linguistics},
pages={961--966},
year={2020}
}