Code release for Large Scale Visual Food Recognition
Our Progressive Region Enhancement Network (PRENet) mainly consists of progressive local feature learning and region feature enhancement. The former mainly adopts the progressive training strategy to learn complementary multi-scale finer local features, like different ingredient-relevant information. The region feature enhancement uses self-attention to incorporate richer contexts with multiple scales into local features to enhance the local feature representation. Then we fuse enhanced local features and global ones from global feature learning into the unified one via the concat layer.
During training, after progressively training the networks from different stages, we then train the whole network with the concat part, and further introduce the KL-divergence to increase the difference between stages for capturing more detailed features. For the inference, considering the complementary output from each stage and the concatenated features, we combine the prediction results from them for final food classification.
-
python 3.6
-
PyTorch >= 1.3.1
-
torchvision >= 0.4.2
-
PIL
-
Numpy
-
dropblock
- Download the food datasets. The file structure should look like:
dataset
├── class_001
| ├── 1.jpg
| ├── 2.jpg
| └── ...
├── class_002
| ├── 1.jpg
| ├── 2.jpg
| └── ...
│── ...
- Download the training and testing list files, e.g. train_full.txt, test_full.txt
- To train a
PRENet
on food datasets from scratch, run:
python main.py --dataset <food_dataset> --image_path <data_path> --train_path <train_path> --test_path <test_path> --weight_path <pretrained_model>
-
Download the pretrained model on Food2k from google/baidu(Code: o0nj)
-
To evaluate a pre-trained
PRENet
on food datasets, run:
python main.py --dataset <food_dataset> --image_path <data_path> --train_path <train_path> --test_path <test_path> --weight_path <pretrained_model> --test --use_checkpoint --checkpoint <checkpoint_path>
CNN | link |
---|---|
vgg16 | google/baidu(Code: puuy) |
resnet50 | google/baidu(Code: 5eay) |
resnet101 | google/baidu(Code: yv1o) |
resnet152 | google/baidu(Code: 22zw) |
densenet161 | google/baidu(Code: bew5) |
inception_resnet_v2 | google/baidu(Code: xa8r) |
senet154 | google/baidu(Code: kwzf) |
If you find this repo useful to your project, please consider to cite it with following bib:
@article{min2021large,
title={Large scale visual food recognition},
author={Min, Weiqing and Wang, Zhiling and Liu, Yuxin and Luo, Mengjiang and Kang, Liping and Wei, Xiaoming and Wei, Xiaolin and Jiang, Shuqiang},
journal={arXiv preprint arXiv:2103.16107},
year={2021}
}