Twitter data mining with Python

Using Support Vector Machine and Term Frequency–Inverse Document Frequency in three steps:

Collect many tweets from Twitter
Classify some tweets with positive, negative or neutral
Predict others tweets

System dependencies

sudo apt-get install build-essential python-dev python-setuptools \
                     python-numpy python-scipy libblas-dev gfortran \
                     libatlas-dev libatlas3gf-base liblapack-dev \
                     libatlas-base-dev

If you use Python 3

sudo apt-get install python3-minimal

Install Packages

Use pip with virtualenv

pip install -r requirements.txt

Configuration

The Natural Language Toolkit provide human language data (over 50 corpora and lexical resources) in different languages and formats as twitter samples, RSLP Stemmer (Removedor de Sufixos da Lingua Portuguesa), complete work of Machado de Assis for Brazilian Portuguese language and much more.

For download all corpora

python -m nltk.downloader all

Or download the corpora of your choice from Python Interpreter

>>> import nltk
>>> nltk.download()

A new window should open, showing the NLTK Downloader.

Credentials

Set your Twitter credentials from Twitter Application Manager for variables: CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN and ACCESS_TOKEN_SECRET.