/SentiCR

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

SentiCR

SentiCR is an automated sentiment analysis tool for code review comments. SentiCR uses supervised learning algorithms to train models based on 1600 manually label code review comments (https://github.com/senticr/SentiCR/blob/master/SentiCR/oracle.xlsx). Features of SentiCR include:

  • Special preprocessing steps to exclude URLs and code snippets
  • Special preprocessing for emoticons
  • Preprocessing steps for contractions
  • Special handling of negation phrases through precise identification
  • Optimized for the SE domain

Performance

In our hundred ten-fold cross-validations, SentiCR achieved 83.03% accuracy (i.e., human level accuracy), 67.84% precision, 58.35% recall, and 0.62 f-score on a Gradient Boosting Tree based model. Details cross validation results are included here: https://github.com/senticr/SentiCR/tree/master/cross-validation-results

Cite

Ahmed, T. , Bosu, A., Iqbal, A. and Rahimi, S., "SentiCR: A Customized Sentiment Analysis Tool for Code Review Interactions", In Proceedings of the 32nd IEEE/ACM International Conference on Automated Software Engineering (NIER track).

@INPROCEEDINGS{Ahmed-et-al-SentiCR,

author = {Ahmed, Toufique and Bosu, Amiangshu and Iqbal, Anindya and Rahimi, Shahram},

title = {{SentiCR: A Customized Sentiment Analysis Tool for Code Review Interactions}},

year = {2017},

series = {ASE '17},

booktitle = {32nd IEEE/ACM International Conference on Automated Software Engineering (NIER track)}, }