/prenet

大规模食品图像识别

Primary LanguagePythonMIT LicenseMIT

Progressive Region Enhancement Network (PRENet)

Code release for Large Scale Visual Food Recognition

Introduction

method

Our Progressive Region Enhancement Network (PRENet) mainly consists of progressive local feature learning and region feature enhancement. The former mainly adopts the progressive training strategy to learn complementary multi-scale finer local features, like different ingredient-relevant information. The region feature enhancement uses self-attention to incorporate richer contexts with multiple scales into local features to enhance the local feature representation. Then we fuse enhanced local features and global ones from global feature learning into the unified one via the concat layer.

During training, after progressively training the networks from different stages, we then train the whole network with the concat part, and further introduce the KL-divergence to increase the difference between stages for capturing more detailed features. For the inference, considering the complementary output from each stage and the concatenated features, we combine the prediction results from them for final food classification.

Requirement

  • python 3.6

  • PyTorch >= 1.3.1

  • torchvision >= 0.4.2

  • PIL

  • Numpy

  • dropblock

Data preparation

  1. Download the food datasets. The file structure should look like:
dataset
├── class_001
|      ├── 1.jpg
|      ├── 2.jpg
|      └── ...
├── class_002
|      ├── 1.jpg
|      ├── 2.jpg
|      └── ...
│── ...
  1. Download the training and testing list files, e.g. train_full.txt, test_full.txt

Training

  1. To train a PRENet on food datasets from scratch, run:
python main.py --dataset <food_dataset> --image_path <data_path> --train_path <train_path> --test_path <test_path> --weight_path <pretrained_model>

Inference

  1. Download the pretrained model on Food2k from google/baidu(Code: o0nj)

  2. To evaluate a pre-trained PRENet on food datasets, run:

python main.py --dataset <food_dataset> --image_path <data_path> --train_path <train_path> --test_path <test_path> --weight_path <pretrained_model> --test --use_checkpoint --checkpoint <checkpoint_path>

Other pretrained model on Food2K

CNN link
vgg16 google/baidu(Code: puuy)
resnet50 google/baidu(Code: 5eay)
resnet101 google/baidu(Code: yv1o)
resnet152 google/baidu(Code: 22zw)
densenet161 google/baidu(Code: bew5)
inception_resnet_v2 google/baidu(Code: xa8r)
senet154 google/baidu(Code: kwzf)

Contact

If you find this repo useful to your project, please consider to cite it with following bib:

@article{min2021large,
  title={Large scale visual food recognition},
  author={Min, Weiqing and Wang, Zhiling and Liu, Yuxin and Luo, Mengjiang and Kang, Liping and Wei, Xiaoming and Wei, Xiaolin and Jiang, Shuqiang},
  journal={arXiv preprint arXiv:2103.16107},
  year={2021}
}