Have you ever wondered if your cupcake recipe is really a muffin recipe in disguise? This project aims to show the relationship between a user-input recipe and other recipes for baked goods.
There is a hosted Streamlit app here: https://classifying-baking-recipes.streamlit.app/
Over 1000 recipes and 60 common ingredients for baked goods are entered into a MongoDB database to build the dataset.
collect_ingredients.py
uses Spoonacular's API to find unit conversions from cups and other units to grams as well as the nutritional breakdown of common ingredients.
collect_recipes.py
scrapes recipes from three websites to collect recipes. The recipes' categories are set using the search parameters. For example, if a recipe is found in the bread category of a website of by searching for bread recipes, it is classified as bread.
streamlit_app.py
contains the main app. A user inputs a recipe, the recipe undergoes parsing and standardization, and the function in build_dataset.py
constructs an array giving the recipe's overall fraction by weight of protein, sugar, other carbohydrates, unsaturated fat, saturated fat, water, and 61 common baking ingredients. A matrix of of this information is constructed using data in MongoDB so that the user input may be compared against over 1000 others. The 3 nearest neighbors are found and displayed in the app along with charts comparing the nutrient and ingredient breakdowns of the user input and its nearest neighbor. If a recipe has 2 or more neighbors that are in the same recipe category, then the recipe will be classified as that recipe category. Otherwise, classification is inconclusive.
Install requirements using pipenv install
. Save a file called secrets.toml
in the folder called .streamlit. The secrets file requires the following information:
[mongo]
host = "<connection_string>"
[spoonacular]
key = "<api-key>"
The tests folder contains tests that are intended for pytest. The types of tests are as follows:
- Test convert to grams - Using given ingredient information, assess that the conversion from a unit to grams is correct.
- Test nutrients breakdown - Using given ingredient information, assess that the grams of each macro-nutrient are correctly calculated from the overall ingredient weight and nutrient fractions.
- Test mongo connection - Pings the MongoDB database
- Test spoonacular connection - Asserts that a status code of 200 results when calling the Spoonacular API
- Test ingredient - Assesses that a string containing the amount, unit, and name of an ingredient is successfully broken down into its three parts
- Test standardize ingredient - Assesses that the ingredient amount, unit, and name are properly cleaned up, handling different formats of fractions and units
- Test main ingredient - Assesses that Spoonacular gives the correct information about an ingredient
- Test ingredient features - Checks MongoDB data to ensure no fractions are negative
- Test url scraping - Assesses that webscraping succeeds in retreiving recipe urls from a page
- Test recipe scraping - Assesses that given a url, the correct ingredients are scraped