The code in this repo is all you need to make a first submission to the Humpback Whale Identification Competition. It uses the FastAi library release 1.0.36.post1 (this is important - you are likely to encounter an error if you use any other version of the library).
For additional information please refer to the discussion thread on Kaggle forums.
Some people reported issues with running the first_submission notebook. If you encounter the issue, you should be okay to skip to the subsequent notebooks. The one that scores 0.760 on the LB is only_known_train.ipynb
.
- Install the fastai library, specifically version 1.0.36.post1. The easiest way to do it is to follow the developer install as outlined in the README of the fastai repository. Once you perform the installation, navigate to the fastai directory and execute
git checkout 1.0.36.post1
. You can verify that this worked by executing the following inside jupyter notebook or a Python REPL:
import fastai
fastai.__version__
- Clone this repository. cd into data. Download competition data by running
kaggle competitions download -c humpback-whale-identification
. You might need to agree to competition rules on competition website if you get a 403. - Create the train directory and extract files via running
mkdir train && unzip train.zip -d train
- Do the same for test:
mkdir test && unzip test.zip -d test
- Open
first_submission.ipynb
in jupyter notebook and run all cells.
Here is the order in which I worked on the notebooks:
- first_submission - getting all the basics in place
- new_whale_detector - binary classifer known_whale / new_whale
- oversample - addressing class imbalance
- only_known_research - how to modify the architecture and what hyperparams to use
- only_known_train - training on full dataset