Download IMDB data from kaggle
, unzip it and it will have the following structure.
Assume aclImdb
is under the root directory of this project.
aclImdb
|-- train
|-- pos
|-- neg
|-- unsup
|-- test
|-- pos
|-- neg
- First build the dataset for training by
python build_imdb_dataset.py
- For logistic regression, to train:
python trainer.py
, the best val epoch and test accuracy will be shown. - For One hidden layer NN, to train:
python trainer_nn.py
, the best val epoch and test accuracy will be shown. - Then draw the learning curves for both of the models. Simply run
python draw.py
, it requires two binary files generated by the previous step.
Best model at: 12 th epoch, given val set
Test accuracy is: 0.851000
One Hidden Layer NN
Best model at: 13 th epoch, given val set
Test accuracy is: 0.843000