This repository is a hands-on implementation for the IMDB movie review sentimental analysis. The part of tutorial section is following Bags of Popcorn kaggle competition. It's provided a step-by-step improvement leveraging different NLP techniques with the same dataset. I've extended the tutorial based on my interest such as adding Deep learning model, or Multiple input features.
-
Familiar with the tools
- nltk
- gensim
- keras
-
Learn the processeses
- Data cleansing (Remove, Filtering, Tokenizing, Lemmatization, Padding, etc.)
- Feature prrocessing (BOW, TFIDF, Feature engieering, Word Embbedding)
- Input structure required by different models (Tabular, Sequences, Multi-input)
- Model structure (Tree-based model, Deel learning, Recurrent NN, etc.)
-
Applied to others
- Binary classification (sentimental analysis)
- Multi-class classification (types of review)
- Regression (polarization score)
- Applied text features with other kinds of problem (propensity, churn, etc.)
-
Try more feature engieering techniques from text data
- Part of speech
- Verbal and grammatical
- Uni | Bi | Tri gram
- LDA clustering
-
Applied to other problems