Chainer kernel for Cdiscount’s Image Classification Challenge Kaggle competition.
Requirements
- Chainer 3.0.0
Preparation
Downloading data
Download data using kaggle-cli.
$ kg download -u <username> -p <password> -c cdiscount-image-classification-challenge
Extract the file.
$ 7z x category_names.7z
Data conversion
Convert BSON to jpeg file.
$ util/convert_BSON_to_files.py -d train -r <data directory>
$ util/convert_BSON_to_files.py -d test -r <data directory>
category_names.csv, train.bson, test.bson is necessary in .
- File pattern to be converted
- train files:
<data directory>/train/<category>/<_id>-<index>.jpg
- test files:
<data directory>/test/<_id>-<index>.jpg
- train files:
This script referred to this notebook.
Make image label list
$ python util/make_image_label_list.py
Training
$ python train.py <train data list>
Inference
$ python infer.py <test data directory>
Appendix
- 5,270 different categories
- image size: 180 x 180
Train Data
-
7,069,896 products
-
train.bson: 59GB
-
12,371,293 files
-
image files: 81GB
Test Data
-
1,768,182 products
-
test.bson: 15GB
-
3,095,080 files
-
image files: 21GB
files: 0.86055 iters/sec.