CMPUT-651-UofA

Data preparation

Download IMDB data from kaggle , unzip it and it will have the following structure. Assume aclImdb is under the root directory of this project.

aclImdb
    |-- train
        |-- pos
        |-- neg
        |-- unsup
    |-- test
        |-- pos
        |-- neg

First build the dataset for training by python build_imdb_dataset.py
For logistic regression, to train: python trainer.py, the best val epoch and test accuracy will be shown.
For One hidden layer NN, to train: python trainer_nn.py, the best val epoch and test accuracy will be shown.
Then draw the learning curves for both of the models. Simply run python draw.py, it requires two binary files generated by the previous step.

Best model at: 12 th epoch, given val set

Test accuracy is: 0.851000

Best model at: 13 th epoch, given val set

Test accuracy is: 0.843000