Must-read-sentiment Analysis

List of Resources for Sentiment Analysis in General and including resources for Arabic Language as well. The list is under continous update.

Papers:

Sentiment Classification :

Turney 2002 : Thumbs up or thumbs down?

the oldest notable work in sentiment classification,
turney using two words “excellent” and “poor” and point wise mutual PMI information to do unsupervised sentiment classification

Pang, Lee 2002 : Thumbs up?: sentiment classification using machine learning techniques

Published a movie reviews dataset that everyone uses until now
used Machine learning classifiers for 3 k-folds crossvalidation
features were basic bag of words (unigrams and/or bigrams) word existence, word freq ( no tfidf )
other additional features : top unigrams, adjectives, position
compared results to results of features manually selected by two manual annotators
Accuracy of baseline (manual annotated features ~60-70%)
Accuracy of ML ~80-83%

Sentiment Analysis in Arabic Language:

Abbasi et al. Sentiment Analysis in Multiple Languages: Feature Selection for Opinion Classification in Web Forums

very good categorized literature reviews about common features selected, techniques, domains of use in sentiment analysis
uses Entropy Weighted Genetic Algorithm to do feature selection among each of the previous techniques

El-Beltagy, Samhaa R., and Ahmed Ali. "Open issues in the sentiment analysis of arabic social media: A case study." Innovations in Information Technology (IIT), 2013 9th International Conference on. IEEE, 2013.

overview of main issues and obstacles in arabic social media sentiment analysis
semi-automatic generation of ~4k entries egyptian dialect sentiment lexicon (link available in the paper) using conjunctions
Evaluation of Generated Lexicon

Abdul-Mageed & Diab. "AWATIF: A Multi-Genre Corpus for Modern Standard Arabic Subjectivity and Sentiment Analysis." LREC. 2012.

Main usefulness : Good regulations for annotating sentiment datasets
multi-genre annotated corpus of Modern Standard Arabic for SA
built from different resources including Penn Arabic Treebank, Wikipedia Talk Pages and Web forums
manually annotated :
- with Guidlines or simple Guidelines
- with Trained Annotators/ Crowdsourcing
- elaborate the importance of guidelines and training of annotators to produce dependable annota-tions
- dataset not publicly available

Books:

Bing Liu : Sentiment Analysis and Opinion Mining Book is a thorough literature review in various issues of sentiment analysis, this could be nice to get the big picture of sentiment analysis and also to get related work in any of the issues of sentiment analysis.

Courses:

[NLP - Stanford, Dan Jurafsky & Christopher Manning] (https://www.coursera.org/course/nlp)

Datasets:

English Datasets :

Pang & LEE movie reviews:
- Size : 2K , 1000 positive and 1000 negative
- homepage
- Paper
SNAP - Web data: Amazon reviews
- size : 34,686,770 Reviews
- homepage
- Direct links
- Paper

Arabic Datasets & Lexicons :

LABR: Large Arabic book database from bookreviews
- Size : 36K
- homepage
- Paper
Large Arabic Resources for Arabic Sentiment Analysis :
- Datasets of 33K reviews in Movies, Hotels, Restaurants, Products domain
- Generated Lexicon of 2K entries in domains of Books, Movies, Hotels, Restaurants and Products
- Benchmarking of standard machine learning techniques and feature representation methods over the datasets
- Available Code for Experiments as a starter kit for baseline classifiers
- homepage
- Paper : ElSahar, Hady, and Samhaa R. El-Beltagy. "Building Large Arabic Multi-Domain Resources For Sentiment Analysis." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2015.
Unweighted Opinion Mining Lexicon
- An Arabic sentiment Lexicon consisting of 4392 entries mostly of Egyptian dialect (file is .csv, unicode). Compound entries (idioms and expressions are unstemmed). Other entries are prefix stemmed but postfix unstemmed.
- Download
- Paper : El-Beltagy, Samhaa R., and Ahmed Ali. "Open issues in the sentiment analysis of arabic social media: A case study." Innovations in Information Technology (IIT), 2013 9th International Conference on. IEEE, 2013.
Arabic Slang Lexicon for Twitter Sentiment Analysis :
- Lexicon of ~400 terms build automatically from Matching tweets to lexico-syntactic patterns
- Paper : ElSahar, Hady, and Samhaa R. El-Beltagy. "A fully automated approach for arabic slang lexicon extraction from microblogs." Computational Linguistics and Intelligent Text Processing. Springer Berlin Heidelberg, 2014. 79-91.
- Download

People :

Glossary :

The Natural Language Processing Dictionary : Glossary contains definitions of wide range of used terms in Natural language processing topics, very useful when reading papers.

Miscellaneous:

Chris. Manning : deep learning without magic part 1 : main interesting points :

Representing each word by a feature vectors built from words in context
Using Deep Neural networks to adapt those feature weights
use those adapted feature vectors in multiple NLP classification problems
the old way : AL Maas et al. : Learning Word Vectors for Sentiment Analysis

Contributing:

Feel Free to Send a pull Request with any updates you think it's good to add

hadyelsahar/must-read-sentimentAnalysis