Using Support Vector Machine and Term Frequency–Inverse Document Frequency in three steps:
- Collect many tweets from Twitter
- Classify some tweets with positive, negative or neutral
- Predict others tweets
sudo apt-get install build-essential python-dev python-setuptools \
python-numpy python-scipy libblas-dev gfortran \
libatlas-dev libatlas3gf-base liblapack-dev \
libatlas-base-dev
If you use Python 3
sudo apt-get install python3-minimal
Use pip with virtualenv
pip install -r requirements.txt
The Natural Language Toolkit provide human language data (over 50 corpora and lexical resources) in different languages and formats as twitter samples, RSLP Stemmer (Removedor de Sufixos da Lingua Portuguesa), complete work of Machado de Assis for Brazilian Portuguese language and much more.
For download all corpora
python -m nltk.downloader all
Or download the corpora of your choice from Python Interpreter
>>> import nltk
>>> nltk.download()
A new window should open, showing the NLTK Downloader.
Set your Twitter credentials from Twitter Application Manager for variables: CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN and ACCESS_TOKEN_SECRET.
python -m unittest discover
Run the Human-Machine Interface
python hmi.py