/budget-macros

A script for picking a cheapest set of foods given a set of macronutrient requirements. This problem is akin to a 'multidimensional, unbounded knapsack' problem.

Primary LanguagePython

budget-macros

Script for picking the cheapest combination of foods given a set of macronutrient requirements and a database of foods with nutritional and pricing info.

This is sort of a 'multidimensional unbounded knapsack problem' variant, in that items can be repeated indefinitely and there are several requirements to be met - not just one. This type of problem is NP-hard.

budget-calc.py contains code for a lot of different implementations for the solution, but there are three implementations of note: top-down brute force, top-down dynamic programming, and bottom-up dynamic programming. For a relatively normal diet (<2500 calories) and a small set of foods (<8 foods), a top down brute force solution performs best. However, once you increase the requirements or the set of foods, the dynamic programming solutions start to outperform the brute force solution. I go into more detail in the performance section.

Sample commands and output

Requires python 3.5, argparse

Below are outputs using the foods from food-db to satisfy the macro goals in files bulking-a and cutting-b.

$ python budget-calc.py --foods food-db --goals bulking-a
GOALS:
 -> calories: 2700
 -> carbs: 285
 -> fat: 80
 -> protein: 210
==========================================
BRUTE FORCE, ALL MACROS
Performance: 8.6511 s
---------
Cost:     $3.99
Calories: 2732 cal
Protein:  213 g
Carbs:    288 g
Fat:      76 g
Using:
 -> oatmeal: 1 x 45 g
 -> peanut butter: 5 x 15 g
 -> olive oil: 4 x 5 mL
 -> myprotein impact whey (w/ deal): 5 x 25 g
 -> lentils: 4 x 100 g
==========================================
DYNAMIC PROGRAMMING (TOP-DOWN), ALL MACROS
Performance: 7.2987 s
---------
Cost:     $3.93
Calories: 2672 cal
Protein:  210 g
Carbs:    285 g
Fat:      73 g
Using:
 -> lentils: 4 x 100 g
 -> oatmeal: 1 x 45 g
 -> peanut butter: 4 x 15 g
 -> olive oil: 5 x 5 mL
 -> myprotein impact whey (w/ deal): 5 x 25 g
 $ python budget-calc.py --foods food-db --goals cutting-b
 GOALS:
  -> calories: 1400
  -> carbs: 35
  -> fat: 60
  -> protein: 180
 ==========================================
 BRUTE FORCE, ALL MACROS
 Performance: 0.0360 s
 ---------
 Cost:     $4.10
 Calories: 1410 cal
 Protein:  184 g
 Carbs:    36 g
 Fat:      56 g
 Using:
  -> whole milk: 1 x 1 cup
  -> peanut butter: 3 x 15 g
  -> olive oil: 1 x 5 mL
  -> myprotein impact whey (w/ deal): 9 x 25 g
 ==========================================
 DYNAMIC PROGRAMMING (TOP-DOWN), ALL MACROS
 Performance: 0.1142 s
 ---------
 Cost:     $4.13
 Calories: 1370 cal
 Protein:  178 g
 Carbs:    30 g
 Fat:      58 g
 Using:
  -> whole milk: 1 x 1 cup
  -> peanut butter: 1 x 15 g
  -> olive oil: 5 x 5 mL
  -> myprotein impact whey (w/ deal): 9 x 25 g

Performance

This section is to detail what I found with regards to the performance of various algorithms used. I'll be using this section to solidify the ideas I had in my mind, as a guide from which to build further optimizations, and as a reference for the future.

Estimating run-times

Here are all the estimated run times in one area; they'll be used for reference purposes in the next sections too.

Top-down brute force

O(2N), where N is linearly correlated to:

  • the ratio between the goal nutrient sizes and the food nutrient sizes, i.e. cutting food servings in half would impact performance similarly to doubling your goals (e.g. 2000 cal -> 4000 cal)
  • the number of foods in the food database
Top-down dynamic programming

O(2M), where M is similar to N.

Bottom-up dynamic programming

O(Ccfpn), where:

  • C is the caloric goal
  • c is the carbs goal
  • f is the fat goal
  • p is the protein goal
  • n is the number of foods in the database

Top-down brute force

For a typical diet (<2500 calories and a more-or-less balanced set of macronutrient requirements) with <8 different foods/meals in the database, this is the best option. However, increasing the requirements by a factor of two or even doubling the number of foods in the database makes the runtime for this solution explode.

Top-down dynamic programming

Technically, the runtime is the 'same' as brute force here. There is added run time for maintaining a data structure, but lost run time due to overlaps. The overlaps are hightly dependent on the foods in the database.

As an experiment, I tried two alterations to the database: tripled the foods by literally copy and pasting the database two more times, and doubling the amount of foods by adding a new set of foods. In the former case, the dynamic programming approach was much faster than brute force, since there were a lot of overlaps (the foods combinations would be calculated three times). However, in the latter, the run times both grew together, since there were no or few overlaps.

Bottom-up dynamic programming

Although this is the 'best' in terms of scalability, the run time starts at an incredibly high number. With a typical diet and <8 foods in the databse, top-down takes <500,000 steps; whereas this algorithm takes around 100 billion. Although adding more foods and increasing the requirements would quickly make this outperform top-down, that simply means top-down is not usable after some point while this is never usable with all requirements.

Conclusions and future improvements/testing

All algorithms are slow. Top-down is usable in some cases with all requirements. A bottom up approach is usable (and is better) when you ignore at least 2 requirements (say you only care about calories and protein).

I can use this analysis to develop a system which estimates the most reasonable algorithm to use given a set of inputs, at the cost of some time performing calculations on the input. Furthermore, it'll probably be wise to implement a non-deterministic solution or some greedy algorithm which approximates a solution.

I think the way food nutrition numbers are naturally distributed allows for some shortcuts to be taken that would otherwise not work with a completely random set of numbers.

Tentative improvements

x implement using Dynamic Programming bottom-up
x implement using Dynamic Programming top-down
o implement a greedy algorithm
o write mapping function, given requirements, perform predicted fastest solution method
o catch dumb inputs
o food inputs divided into 1 unit items (15mL w/ 150cal -> 1mL w/ 10cal) to eliminate whole '5 x 15mL' type output multiply an output like '5 x 15mL' so it says '75ml' in brackets beside it
o optional requirements (e.g. don't care about carbs)
o flexible requirements (e.g. protein AT LEAST X g, carbs AT MOST X g)
o meals (options are combos of food, rather than indv food items)
o not necessarily starting at 0 foods eaten (e.g. you want to eat a cake, so given you ate cake, now what)

Version features/changelog

Current version: 0.1
WIP: 0.2

Version 0.1

Goals: just get the correct answer

  • implement a brute force solution for a goal containing all macros
  • works for all correct/possible inputs (doesn't handle if goals are literally impossible)

Version 0.2

Goals: efficiency

  • top-down and bottom-up dynamic programming implementations