For this project, we want to classify sentiment values of movie reviews. This is a multiclass clasification because the sentiment consist of 5 different values. To perform this classification, we need to preprocess the data by:
- Cleaning data
- Upsampled sampling to balancing the data
- One Hot Encoding
- Tokenizing
- Sequencing
- Padding
Finally, after several preparation, the data is modeled by two deep learning model combination:
- CNN - BiLSTM
- CNN - LSTM
In conclusion, CNN - BiLSTM performs better with higher validation accuracy 86.416% than CNN -LSTM with accuracy 86.17%