This is a Sentiment Analysis project using the library NLTK in Python.

NLTK
Pipeline
1. Data Cleaning
2. Perform Analysis
Libraries

NLP

Natural Language Processing, or NLP for shot, is broadly defined as the automatic manipulation of natural language, like speech and text by software.

Pipeline

A pipeline is just a way to design a program where the output of one step is the input of the next step.

Text Document
Data Cleaning
[Perform Analysis] (#Perform-Analysis)

Data Cleaning

Convert the raw text into a list of words that are clean text (this is a very important step).

Data Cleaning (pre-processing)
1. Convert to Lower Case
2. Remove Punctuation and Special Characters
3. Tokenization
4. Remove empty line
5. Stopwords Removal
6. Lemmatization

Some definitions:

Tokenization - Convert a sentence into a single words.
Stopwords Removal - Remove words which are present in the sentence and make no difference to the analysis.
Stemming - Reduce the word to the base form. Ex.: Reading -> read.
Lemmatization - Process of grouping together the different inflected forms of a word then they can be analysed as a single item.
- Lemmatization runs 2 times with different parameters. That happens in order to clean words that were not clean the first time.

Vectorization

Convert words into numbers.

Perform Analysis

Plot the analysis. The result should be like the picture below.

Libraries

Matplotlib

Run the Python interpreter and type the command:

% pip install matplotlib

Source: https://matplotlib.org/stable/users/installing.html

scikit-learn

Run the Python interpreter and type the command:

% pip install scikit-learn

Source: https://scikit-learn.org/stable/index.html

NLTK

Natural Language Toolkit (NLTK) is a leading platform for building Python programs to work with human language data.

Run the Python interpreter and type the command:

% pip install nltk

Source: https://www.nltk.org/install.html

NLTK Data

To install the data, first install NLTK, then follow the instructions below.

Run the Python interpreter and type the commands:

>>> import nltk
>>> nltk.download()

A new window should open, showing the NLTK Downloader. Click on the File menu and select Change Download Directory. For central installation, set this to:

C:\nltk_data (Windows)
/usr/local/share/nltk_data (Mac)
/usr/share/nltk_data (Unix)

Next, go to the tab All Pakages select the packages punky, and press the buttons Download. Leave like the picture below.

Source: https://www.nltk.org/data.html

NLTK Stopwords Corpus

The steps to download the stopwords data is similar then NLTK. Follow the instructions below.

Go to the tab All Pakages select the packages stopwords, and press the buttons Download. Leave like the picture below.

NLTK WordNet

The steps to download the wordnet data is similar then NLTK. Follow the instructions below.

Go to the tab All Pakages select the packages wordnet, and press the buttons Download. Leave like the picture below.

ricardorqr/NLTK

Contents

NLP

Pipeline

Data Cleaning

Vectorization

Perform Analysis

Libraries

Matplotlib

scikit-learn

NLTK

NLTK Data

NLTK Stopwords Corpus

NLTK WordNet