Code for building a machine Learning powered app to decide whether a photo is of food or not.
See it working live at: https://foodnotfood.app
Yes, that's all it does.
It's not perfect.
But think about it.
How do you decide what's food or not?
Remember hotdog not hotdog?
That's what this repo builds, excepts for food or not.
It's arguably harder to do food or not.
Because there's so many options for what a "food" is versus what "not food" is.
Whereas with hotdog not hotdog, you've only got one option: is it a hotdog or not?
I built this app during a 10-hour livestream to celebrate 100,000 YouTube Subscribers (thank you thank you thank you).
The full stream replay is available to watch on YouTube.
The code has changed since the stream.
I made it cleaner and more reproducible.
My notes are on Notion.
Note: If this doesn't work, please leave an issue.
To reproduce, the following steps are best run in order.
You will require and installation of Conda, I'd recommend Miniconda.
git clone https://github.com/mrdbourke/food-not-food
cd food-not-food
I use Conda for my environments. You could do similar with venv
and pip
but I prefer Conda.
This code works with Python 3.8.
conda create --prefix ./env python=3.8 -y
conda activate ./env
conda install pip
Getting TensorFlow + GPU to work
Follow the install instructions for running TensorFlow on the GPU.
This will be required for model_building/train_model.py
.
Note: Another option here to skip the installation of TensorFlow is to use your global installation of TensorFlow and just install the requirements.txt
file below.
Other requirements
If you're using your global installation of TensorFlow, you might be able to just run pip install requirements.txt
in your environment.
Or if you're running in another dedicated environment, you should also be able to just run pip install -r requirements.txt
.
pip install -r requirements.txt
- Download Food101 data (101,000 images of food).
python data_download/download_food101.py
- Download a subset of Open Images data. Use the
-n
flag to indicate how many images from each set (train/valid/test) to randomly download.
For example, running python data_download/download_open_images.py -n=100
downloads 100 images from the training, validation and test sets of Open Images (300 images in total).
The downloading for Open Images data is powered by FiftyOne.
python data_download/download_open_images.py -n=100
- Extract the Food101 data into a "
food
" directory, use the-n
flag to set how many images of food to extract, for example-n=10000
extracts 10,000 random food images from Food101.
python data_processing/extract_food101.py -n=10000
- Extract the Open Images images into
open_images_extracted
directory.
The data_processing/extract_open_images.py
script uses the Open Images labels plus a list of foods and not foods (see data/food_list.txt
and data/non_food_list.txt
) to separate the downloaded Open Images.
This is necessary because some of the images from Open Images contain foods (we don't want these in our not_food
class).
python data_processing/extract_open_images.py
- Move the extracted images into "
food
" and "not_food
" directories.
This is necessary because our model training file will be searching for class names by the title of our directories (food
and not_food
).
python data_processing/move_images.py
- Split the data into training and test sets.
This creates a training and test split of food
and not_food
images.
This is so we can verify the performance of our model before deploying it.
It'll create the structure:
train/
food/
image1.jpeg
image2.jpeg
...
not_food/
image100.jpeg
image101.jpeg
...
test/
food/
image201.jpeg
image202.jpeg
...
not_food/
image301.jpeg
image302.jpeg
...
To do this, run:
python data_processing/data_splitting.py
Note: This will require a working install of TensorFlow.
Running the model training file will produce a TensorFlow Lite model (this is small enough to be deployed in a browser) saved to the models
directory.
The script will look for the train
and test
directories and will create training and testing datasets on each respectively.
It'll print out the progress at each epoch and then evaluate and save the model.
python model_building/train_model.py
The current deployed model uses about 40,000 images of food and 25,000 images of not food.
- Food images come from the Food101 dataset.
- Not food and some food images come from Open Images.