/Senti4SD

An emotion-polarity classifier specifically trained on developers' communication channels

Primary LanguageRMIT LicenseMIT

Senti4SD

Senti4SD is an emotion polarity classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and evaluated on a gold standard of over 4K posts extracted from Stack Overflow. It is part of the Collab Emotion Mining Toolkit, (EMTk).

Fair Use Policy

Please, cite the following paper if you intend to use our tool for your own research:

Calefato, F., Lanubile, F., Maiorano, F., Novielli N. (2018) "Sentiment Polarity Detection for Software Development," Empirical Software Engineering, 23(3), pp:1352-1382, doi: https://doi.org/10.1007/s10664-017-9546-9. (BibTeX)

NOTE: You will need to install Git LFS extension to check out this project. Once installed and initialized, simply run:

$ git lfs clone https://github.com/collab-uniba/Senti4SD.git

How do I get set up?

To set up the tool, simply run the following script from the command line:

$ sh requirements.sh

To run the script you need:

  • Java 8
  • R

The script will also install, if not already present, three R packages:

Running

To classify your data using Senti4SD, execute the following instruction from the command line:

$ cd ClassificationTask
$ sh classificationTask.sh inputCorpus.csv outputPredictions.csv

where inputCorpus.csv is a file containing the data you want to classify, considering a document for each line, and outputPredictions.csv is where the predictions will be saved. This last parameter is optional, if not present the output of the classification will be saved in a file called predictions.csv.

To see how the tool works, you can execute the following example:

$ cd ClassificationTask
$ sh classificationTask.sh Sample.csv

This will produce as output a csv file called predictions.csv.

Who do I talk to?