The repository contains the code of our CVPR15 paper Learning from Massive Noisy Labeled Data for Image Classification (paper link).
Clone this repository
# Make sure to clone with --recursive to get the modified Caffe git clone --recursive
Build the Caffe
cd external/caffe # Now follow the Caffe installation instructions here: # # If you're experienced with Caffe and have all of the requirements installed # and your Makefile.config in place, then simply do: make -j8 && make py cd -
Setup an experiment directory. You can either create a new one under external/, or make a link to another existing directory.
mkdir -p external/exp
ln -s /path/to/your/exp/directory external/exp
Download the CIFAR-10 data (python version).
Synthesize label noise and prepare LMDBs. Will corrupt the labels of 40k randomly selected training images, while leaving other 10k image labels unchanged.
scripts/cifar10/ 0.3
The parameter 0.3 controls the level of label noise. Can be any number between [0, 1].
Run a series of experiments
# Train a CIFAR10-quick model using only the 10k clean labeled images scripts/cifar10/ # Baseline: # Treat 40k noisy labels as ground truth and finetune from the previous model scripts/cifar10/ # Our method scripts/cifar10/ scripts/cifar10/ scripts/cifar10/
We provide the training logs in logs/cifar10/
for reference.
Clothing1M is the dataset we proposed in our paper.
Download the dataset. Please contact[at]gmail[dot]com to get the download link. Untar the images and unzip the annotations under
. The directory structure should beexternal/exp/datasets/clothing1M/ ├── category_names_chn.txt ├── category_names_eng.txt ├── clean_label_kv.txt ├── clean_test_key_list.txt ├── clean_train_key_list.txt ├── clean_val_key_list.txt ├── images │ ├── 0 │ ├── ⋮ │ └── 9 ├── noisy_label_kv.txt ├── noisy_train_key_list.txt ├── └── venn.png
Make the LMDBs and compute the matrix C to be used.
Run experiments for our method
# Download the ImageNet pretrained CaffeNet wget -P external/exp/snapshots/ # Train the clothing prediction CNN using only the clean labeled images scripts/clothing1M/ # Train the noise type prediction CNN scripts/clothing1M/ # Train the whole net using noisy labeled data scripts/clothing1M/ scripts/clothing1M/
We provide the training logs in logs/clothing1M/
for reference. A final trained model is also provided here. To test the performance, please download the model, place it under external/exp/snapshots/clothing1M/
, and then
# Run the test
external/caffe/build/tools/caffe test \
-model models/clothing1M/noisy_label_loss_test.prototxt \
-weights external/exp/snapshots/clothing1M/noisy_label_loss_inference.caffemodel \
-iterations 106 \
-gpu 0
The self-brewed external/caffe
supports data parallel with multiple GPUs using MPI. One can accelerate the training / test process by
- Compile the caffe with MPI enabled
- Tweak the training shell scripts to use multiple GPUs, for example,
mpirun -n 2 ... -gpu 0,1
Detailed instructions are listed here.
title={Learning from Massive Noisy Labeled Data for Image Classification},
author={Xiao, Tong and Xia, Tian and Yang, Yi and Huang, Chang and Wang, Xiaogang},