iFood 2018 Challenge @ FGVC5, CVPR 2018

Being able to automatically identify the food items in an image can assist towards food intake monitoring to maintain a healthy diet. Food classification is a challenging problem due to the large number of food categories, high visual similarity between different food categories, as well as the lack of datasets that are large enough for training deep models. In this competition, we introduce a new dataset of 211 fine-grained (prepared) food categories with 101733 training images collected from the web. We provide human verified labels for both the validation set of 10323 images and the test set of 24088 images. The goal is to build a model to predict the fine-grained food-category label given an image.

The main challenges are:

  • Fine-grained Classes: The classes are fine-grained and visually similar. For example, the dataset has 15 different types of cakes, and 10 different types of pastas.

  • Noisy Data: Since the training images are crawled from the web, they often include images of raw ingredients or processed and packaged food items. This is refered to as cross-domain noise. Further, due to the fine-grained nature of food-categories, a training image may either be incorrectly labeled into a visually similar class or be annotated with with a single label despite having multiple food items.

This competition is part of the fine-grained visual-categorization workshop (FGVC5 workshop) at CVPR 2018. Individuals/teams with top submissions will present their work as a poster at the FGVC5 workshop. We will have some cash prizes as well and are current working on them with our sponsors. Stay tuned!

Updates

4/25/18: The Github page for the challenge is online

4/25/18: Training, validation and test data is available

Dates

Data Released April 25, 2018
Submission Deadline June 15th, 2018
Winners Announced June 16th, 2018

Evaluation Server

The challenge is hosted on Kaggle

Data

There is a total of 211 food categories in the dataset. A complete list of classes is available here.

Training Data

The training data consists of 101733 images from 211 classes. The training data is collected from web images and consists of noisy labels.

Validation Data

The validation data consists of 10323 images from 211 classes. The test data is collected from web images and the labels are human verified. It does not contain noisy labels.

Test Data

The training data consists of 24088 images from 211 classes. The test data is collected from web images and the labels are human verified. It does not contain noisy labels.

Data Download and Format

Annotations (2.6 MB)

  • Running md5sum annot.tar on the tar file should produce 1580c3d24167c7b7a2f297903757805d
  • The tar contains 4 files
    • class_list.txt: Contains the names of 211 class labels. This can be used to map class_ids with class names.
    • train_info.csv: Each line of this csv containing the "image_name,label" pair for training data. For example, "train_00000.jpg,94" refers to image train_00000.jpg having class_id 94. The class_id can be mapped to class name using class_list.txt.
    • val_info.csv: Each line of this csv containing the "image_name,label" pair for validation data.
    • test_info.csv: csv only provides the list of test images.
  • We provide separate tars for train, val and test images as mentioned below.

Train Images (2 GB)

  • Running md5sum train.tar on the tar file should produce 8a8b099e158800f2bb4883992ef35230
  • Contains training images.
  • For label information see annotation file train_info.csv.

Validation Images (200 MB)

  • Running md5sum val.tar on the tar file should produce 51d666f9ab34833c117dfe6c06e3bec3
  • Contains validation images.
  • For label information see annotation file val_info.csv.

Test Images (467 MB)

  • Running md5sum train.tar on the tar file should produce d7b89119c434b4b01868b7307cc22a94
  • Contains testing images.
  • The label will be evaluation on the evaluation server.

Evaluation

We follow a similar metric to the classification tasks of the ILSVRC. For each image , an algorithm will produce 3 labels , . For this competition each image has one ground truth label , and the error for that image is:

Where

The overall error score for an algorithm is the average error over all test images:

Submission File Format

image_name,label1 label2 label3 
test_0001.jpg,0 1 10 
test_0002.jpg,1 3 5 
test_0003.jpg,0 5 1 

Please include the header as shown above for correct parsing. Each line will correspond to one test image and will be identified by the id (e.g test_0001.jpg refers to image test_0001.jpg) for computing accuracy.

Rules

  • Participants should use only the provided training and validation images for training models. Validation data should only be used for validation.
  • We do not allow augmentation with any prior datasets or additional data during training. Pretraining with additional data (such as ImageNet) is allowed as long as participants do not actively collect additional data for the target categories in iFood 2018 challenge. Use of any external data should be properly acknowledged and cited. The general rule is that we want participants to use only the provided training and validation images to train a model to classify the test images.
  • Collecting additional annotations for the train images is not allowed.
  • Hand labeling of test data is not allowed and will lead to disqualification.

Terms of Use

By downloading this dataset you agree to the following terms:

  • You will use the data only for non-commercial research and educational purposes.
  • You will NOT distribute the above images.
  • The organizers make no representations or warranties regarding the data, including but not limited to warranties of non-infringement or fitness for a particular purpose.
  • You accept full responsibility for your use of the data and shall defend and indemnify the organizers, including its employees, officers and agents, against any and all claims arising from your use of the data, including but not limited to your use of any copies of copyrighted images that you may create from the data.

Acknowledgement

We would like to thank CVDF Foundation and Tsung-Yi Lin for helping us with hosting the data.

Organizers

Karan Sikka, SRI International
Parneet Kaur*, Johnson & Johnson
Weijun Wang, Google
Ajay Divakaran, SRI International
Serge Belongie, Cornell University and Cornell Tech

*work done while Parneet was an intern at SRI International

For any further inquiries please contact us at ifoodcvpr18@gmail.com