In this notebook we use NLP techniques to clean and preprocess the data. After that, we build a CNN to predict polarity of each document.
Increase the data to get a better result
There are 2000 reviews belonging to 2 classes. Each document is a review and it can be negative or positive.
Maximum number of words for positive reviews is 1693 and for negative reviews is 1400. We need this information for text padding.
Here we load the documents and assign each one a target. We assign 0 to negative reviews and 1 to positive ones.
Other actions:
- Punctuation removal
- Stopwords removal
- Word tokenization
- Train-test split
- 1600 documents for train and 400 documents for test
- Tokenizing
- Encoding
- Text padding
We use the following model