Clustered ImageNet Labels

This work was done as a term project in 2021 Spring CS570: Artificial Intelligence and Machine Learning, Prof: Tae-Kyun Kim, at KAIST.

🔗 Project Report (Scientific Paper Style) 🔗 Clustered ImageNet Labels (json file)

Project Abstract

While using a pre-trained model is the easiest option for non-researchers when implementing a deep learning algorithm, most of the pre-trained models are trained with benchmark datasets which are far from the production applications. In this study, we present Clustered ImageNet Labels (CIL) as a dataset for production-friendly image classifiers. To reduce the management and computational cost, we merge labels of ImageNet while keeping the data in ImageNet unchanged, instead of composing a completely new dataset. CIL includes 488 reduced classes, where too finely divided classes in ImageNet are grouped into a general category. We demonstrate that a model trained with CIL obtains higher accuracy compared to the original ImageNet labels, and we show that creating a ready-to-hand image classifier can be done easily by finetuning the ImageNet pre-trained model with CIL.

Sample mapping rules in Clustered ImageNet Labels, where too finely divided classes in original ImageNet are clustered into a general class.

How to train with CIL

You can use CIL when training an image classifier.
Training code using CIL is included in this repo.

# 1. Clone this repo
$ git clone https://github.com/Prev/clustered-imagenet-labels

# 2. Change directory
$ cd clustered-imagenet-labels/trainer

# 3. Download the dataset

# 4. Install dependencies
$ pip install -r requirements.txt

# 5. Run python
$ python main.py ~/dataset/imagenet_32_32/

The detailed instruction is available at trainer.
You can also check the training code written in PyTorch in the same link.

How to use the survey application

Visit survey_app to see the instruction for running the survey app.