/dl-sentiment-coco

A deep-learning approach that, starting from word embedding representations, measures the sentiment polarity of textual reviews posted by learners after attending online courses.

Primary LanguagePythonGNU Lesser General Public License v3.0LGPL-3.0

Deep Learning Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews

Build Status GitHub version Dependency Status Open Source Love

This repository contains the resources outcome of the work "Deep Learning Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews".

The code allows you to create a Deep Learning approach that, starting from Word Embedding representations, measures the sentiment polarity of textual reviews posted by learners after attending online courses.

Installation

Install Python (>=3.5):

$ sudo apt-get update
$ sudo apt-get install python3.5

Clone this repository:

$ git clone https://github.com/mirkomarras/dl-sentiment-coco.git

Install the requirements:

$ pip install -r dl-sentiment-coco/requirements.txt

Usage

Prepare data for embedding generation, sentiment prediction training and testing.

The entire_file should be a comma-separated csv file including a column score_field that lists the scores associated to the comments. The script creates two files: (i) traintest_file for sentiment training and test and (ii) embs_file for embedding generation. For sentiment prediction testing, samples_per_class samples per class are selected.

Below you can find a sample splitting command:

$ python ./dl-sentiment-coco/code/comment_splitter.py 
--entire_file "./dl-sentiment-coco/data/entire_course_comments.csv" 
--score_field "learner_rating" 
--traintest_file "./dl-sentiment-coco/data/traintest_course_comments.csv" 
--embs_file "./dl-sentiment-coco/data/embs_course_comments.csv" 
--samples_per_class 6500

Create a folder data and copy the online course review dataset together with its splitted files available at this link.

Create context-specific embeddings from the embedding generation file.

You can create your own embeddings on your set of data. You can choose to use the official toolkits.

Glove: you must download the sources from here and then change the file demo.sh to fit your settings.

FastText: you must download the sources from here. Details about the use of the library are explained in the same link.

Word2Vec: we use the gensim python library to build word2vec embeddings.

As an alternative, you can download this zip archive and run the following. It will create a subdirectory ./embeddings where the chosen type of embeddings will be generated.

First, install the requirements with:

pip install -r requirements

Then, use the following command:

python generate_embeddings.py
--input-file "myfile.txt"   # default: reviews.txt 
--emb-size 100              # default: 100
--iter 10                   # default: 5
--workers 2                 # default: 4
--min-count 5               # default: 5
--type "glove"              # default: word2vec

If you want to use our trained embeddings you must create the nested folders embeddings/specific and copy the context-specific embeddings available at this link.

Train and test your model from context-specific embeddings and train/test comments files.

The traintest_file should be a comma-separated csv file including two columns: (i) comment_field that lists the comments and (ii) score_field that lists the scores associated to the comments. The script creates models, subsequently instantiated with embeddings dictionaries from embs_dir, able to assign one of n_classes classes to a comment with max_len words. Each model is trained for n_epochs on batches of size batch_size, and tested through stratified n_fold cross-validation.

Below you can find a sample train/test command:

python ./dl-sentiment-coco/code/score_trainer_tester.py 
--traintest_file "data/traintest_course_comments.csv" 
--comment_field "learner_comment" 
--score_field "learner_rating" 
--max_len 500 
--n_classes 2 
--embs_dir "embeddings/specific/fasttext" 
--n_epochs 20 
--batch_size 512 
--n_fold 5

Create two nested folders models/class2/ and results/class2 and copy the models and results available at this link.

Contributing

We welcome contributions. Feel free to file issues and pull requests on the repo and we will address them as we can.

For questions or feedback, contact us at {danilo_dessi, fenu, mirko.marras, diego.reforgiato}@unica.it.

Citations

If you use this source code in your research, please cite the following entries.

Dessì, D., Dragoni, M., Fenu, G., Marras, M., Reforgiato, D. (2019). 
Deep Learning Adaptation with Word Embeddings for Sentiment Analysis on Online Course Reviews. 
In: Deep Learning based Approaches for Sentiment Analysis, Springer.
Dessì, D., Dragoni, M., Fenu, G., Marras, M., & Recupero, D. R. (2019). 
Evaluating Neural Word Embeddings Created from Online Course Reviews for Sentiment Analysis. 
In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, 2124-2127, ACM.

Credits and License

Copyright (C) 2019 by the Department of Mathematics and Computer Science at University of Cagliari.

This source code is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for details.

You should have received a copy of the GNU General Public License along with this source code. If not, go the following link: http://www.gnu.org/licenses/.