Can we predict if a food is "Keto Friendly"?
- The Ketogenic Diet (keto) is a low carb, high fat diet. Many studies show that this type of diet can help you lose weight and improve your health.
- For this project I will be exploring a "Nutritional Facts" database and using machine learning clustering methodologies to predict if a food is "Keto Friendly".
- Predict if a food is "Keto Friendly" based solely on it's macro ratios (in grams) from the given nutritional facts database.
- Use a machine learning clustering model to answer the problem above and find out if certain groupings exist in the data.
- A well-documented Github Repo with a Final Notebook and README.
- Slide Deck Presentation suitable for an audience of data science peers that summarizes findings.
- Writeup for resume
Feature | Definition |
---|---|
Ketogenic Diet (Keto) | A low carb, high fat diet. |
Nutritional Facts/Food Labels | These labels are found on packaged foods and beverages. They are meant as a tool for making informed food choices that contribute to healthy lifelong eating habits. |
Macronutrients (Macros) | Carbohydrates, fats, and proteins. They are the nutrients you use in the largest amounts, as opposed to micronutrients, such as a variety of vitamins in food. |
Carbohydrates | The sugars, starches and fibers found in fruits, grains, vegetables and milk products. |
Fats | Fatty acids most commonly found in meat and dairy products. |
Protein | An essential nutrients at are the "building blocks" of body tissue. |
- Data is aquired from Kaggle ("Nutritional Values for Common Foods and Products") and https://www.nutritionvalue.org.
- Functions are stored in the acquire.py file.
- File is a reproducible component for gathering the data.
- Create a prepare.py file.
- Clean dataset.
- Missing values are investigated and handled.
- Run train, validate, and test.
- File is a reproducible component that is ready for exploration.
- Explore nutritional facts data across over 8.7k different foods.
- Find macronutrients (carbohydrates, fats, and protein) ratio patterns by creating histograms.
- Summarize takeaways and conclusions.
- Fit different K-means clustering models.
- Choosing k with inertia (the sum of squared distances from each point to it's assigned centroid).
- Use elbow method to determine a good value for k.
- The K-means, k=4 model, provided the best clustering.
- Cluster 1 matched up very closely to a ketogenic diet that could be more balanced and sustainable in the long-term.
- This information might be great for someone who needs help making food choices that are not all just meat, cheese, and eggs.
- I believed I met my goal of creating a clustering model that could provided a user with a more balanced and sustainable "Keto Friendly" diet.
- Acquire: Obtain a larger dataset.
- Prepare: Exclude outliers for further investigation.
- Explore: Find more patterns between macros and compare different foods to see which is the "better/healthier" choice.
- Model: Additional modeling with different macro combinations.
- Can other food items be classified under certain diets (vegan, vegetarian, low-carb, etc) using similar methods?
- Could this system be used on an food tracking app to give users suggestions on what food they should be buying depending on what diet they are on? This could take out a lot of the guessing and research that comes with being on a diet and really simplify things for the user.
All files are reproducible and available for download and use.
- Read this README.md
- Download the aquire.py, prepare.py, and Final_Report.ipynb files
Dani Bojado