simon-andrews/umass-toolkit

Food menu parsing stuff assumes data is for dining commons

simon-andrews opened this issue · 1 comments

Dining commons have really nice menus that give you all sorts of nutritional information and ingredients and stuff. We take advantage of this to give users a whole bunch of useful information:

def category_html_to_dict(html_string, meal, category):
soup = BeautifulSoup(html_string, 'html.parser')
items = soup.find_all('a', href='#inline')
ret = []
for item in items:
dish = {}
dish['category-name'] = category
dish['meal-name'] = meal
for attribute in item.attrs.keys():
if attribute.startswith('data-') and not attribute.endswith('dv'):
attribute_name = attribute[5:]
data = item.attrs[attribute]
if attribute_name == 'calories' or attribute_name == 'calories-from-fat':
data = int(data) if data else None
elif attribute_name == 'clean-diet-str':
data = data.split(', ')
attribute_name = 'diets'
elif attribute_name in ['allergens', 'ingredient-list']:
data = parse_list(data)
elif attribute_name in ['cholesterol', 'sodium', 'dietary-fiber', 'protein', 'sat-fat', 'sugars',
'total-carb', 'total-fat', 'trans-fat']:
data = ureg.Quantity(data) if data else None
dish[attribute_name] = data
ret.append(dish)
return ret

Unfortunately, non-DC locations do not do this, and basically just have plain text menus. We should handle this.

I'm not sure what I think is the best way to handle this is. My current idea is maybe a get_dc_menu function, separate from get_menu. This could be confusing though because getting DC menus is probably going to be the more useful and more commonly used feature, so I don't like making it the "special case."

I dunno, thoughts?

I think separate functions is good. I was toying with the idea of get_menu having a different response depending on the ID it receives (list for DC locations, string for non-DC locations) but I think a predictable return type for each function is important.

We should probably have a location_is_dc function to help determine if an ID corresponds to a DC or non-DC location and use this to raise errors accordingly.

So users would write something like:

def my_dc_function(id):
  try:
    menu = get_dc_menu(id)
    do_something(menu)
  except ValueError:
    print('ID does not correspond to a DC location')

As for the names, I agree that get_menu could be too vague since it only applies to non-DC locations.
Some potential ideas:

  • get_dc_menu and get_non_dc_menu
  • get_parsed_menu and get_raw_menu