FoodBERT: Exploiting Food Embeddings for Ingredient Substitution

Official repository of the paper "Exploiting Food Embeddings for Ingredient Substitution" (Published at the International Conference on Health Informatics 2021).

Identifying fitting substitutes for cooking ingredients can be beneficial for various goals, such as nutrient optimization, avoiding allergens, or adapting a recipe to personal preferences. In this repository, we present two models for ingredient embeddings, Food2Vec and FoodBERT. Additionally, we combine both approaches with images, resulting in two multimodal representation models. FoodBERT is furthermore used for relation extraction. According to a ground truth based evaluation and a human evaluation, FoodBERT, and especially its multimodal version, is best suited for substitute recommendations in dietary use cases.

Installation:

Clone this repository

git clone https://github.com/ChantalMP/Exploiting-Food-Embeddings-for-Ingredient-Substitution

Install requirements:

Python 3.7

pip install -r requirements.txt
python -m spacy download en_core_web_lg

Download data and models
- Download https://github.com/ChantalMP/Exploiting-Food-Embeddings-for-Ingredient-Substitution/releases/download/0.1/food2vec_models.zip and place the content in ./food2vec/models
- Download https://github.com/ChantalMP/Exploiting-Food-Embeddings-for-Ingredient-Substitution/releases/download/0.1/foodbert_data.zip and place the content in ./foodbert/data
- Download https://github.com/ChantalMP/Exploiting-Food-Embeddings-for-Ingredient-Substitution/releases/download/0.1/foodbert_embeddings_data.zip and place the content in ./foodbert_embeddings/data
- Download https://github.com/ChantalMP/Exploiting-Food-Embeddings-for-Ingredient-Substitution/releases/download/0.1/multimodal_data.zip and place the content in ./multimodal/data
- Download https://github.com/ChantalMP/Exploiting-Food-Embeddings-for-Ingredient-Substitution/releases/download/0.1/relation_extraction_models.zip and place the content in ./relation_extraction/models
Optional: Generate data for FoodBERT and RE training
- First, get the Recipe1M+ dataset by Marin et al. from http://im2recipe.csail.mit.edu/dataset/login/ (login required; as of November 2020, the correct link is: Layers (381 MiB))
- Unzip, rename layer1.json to recipe1m.json and place it in .data/
- Afterwards run
```
python -m normalisation.normalize_recipe_instructions
python -m foodbert.preprocess_instructions
```
Only for RE training:
- Sadly, we can not publish the comment data needed for the relation extraction model
- If you want to train or use the relation extraction model to generate substitutes, you need to scrape comments yourself. The scripts for this are provided as is, but they are not maintained.
- All scripts can be found in comment_scraping.
Evaluation:
- We can't make our ground-truth public, but if you want to reproduce our results or compare your own method, it is available upon request.

Usage

Human and ground-truth-based evaluation: see evaluation/README.md
Food2Vec training and substitute generation: see food2vec/README.md
FoodBERT training: see foodbert/README.md
FoodBERT substitute generation: see see foodbert_embeddings/README.md
Generating image embeddings for multimodal approaches: see see multimodal/README.md
Data normalisation: see normalisation/README.md
Relation Extraction training and substitute generation: see relation_extraction/README.md

Colab Examples

Using FoodBERT:
Using Food2Vec:
Using Image Embeddings:
Generate Substitutes - FoodBERT:
Generate Substitutes - Food2Vec:

If you encounter any problems with the code, feel free to contact us at {chantal.pellegrini, ege.oezsoy, monika.wintergerst}[at]tum[dot]de.