/FlavorData

A Julia wrapper for flavor data compiled by Ahn et. al.

Primary LanguageJuliaOtherNOASSERTION

Flavor Data

A Julia wrapper module for data compiled by Ahn et. al. Flavor network and the principles of food pairing Nature: SCIENTIFIC REPORTS | 1 : 196 | DOI: 10.1038/srep00196 (2011), and Ahn et. al.THE FLAVOR NETWORK Leonardo, Volume 46, Issue 3, June 2013, p.272-273. The papers and data are open access. The first paper and data is licensed as CC-BY-NA-SA 3.0 and the second as CC-BY-NA-SA 4.0. The Julia code is GPLv3, Copyright (c) 2017-07-26: W. R. Bauer.

The module exports four functions, compound_ids(), ingredient_ids(), ingredient_compounds(), and cuisines(). The first two list chemical compound and ingredient information, respectively.

julia> using FlavorData

julia> cids = compound_ids()
1107×2 Array{String,2}:
 "jasmone"                                    "488-10-8"   
 "5-methylhexanoic_acid"                      "628-46-6
  ⋮
 "(+/?)-methyl_5-acetoxyhexanoate"            "35234-22-1" 
 "ethyl_sorbate"                              "2396-84-1"

julia> iids = ingredient_ids()
1530×2 Array{String,2}:
 "magnolia_tripetala"          "flower"        
 "calyptranthes_parriculata"   "plant"         
  ⋮
 "artemisia_porrecta_oil"      "plant"         
 "munster_cheese"              "dairy"

The third, ingredient_compounds(), returns a sparse incidence (0 or 1) matrix indicating which compounds (columns) are found in which ingredients (rows.)

julia> ingredient_compounds()
1530×1107 SparseMatrixCSC{Int64,Int64} with 36781 stored entries:
  [18  ,    1]  =  1
  [26  ,    1]  =  1
  ⋮
  [1429, 1107]  =  1
  [1475, 1107]  =  1

The above indicates that ingredient number 1475 contains compound number 1107:

julia> iids[1475,1]
"champagne_wine"

julia> cids[1107,1]
"ethyl_sorbate"

The fourth function, cuisines(), returns a dictionary indexed by regions.

julia> cs = cuisines()
Dict{String,SparseMatrixCSC{Int64,Int64}} with 11 entries:
  "EasternEuropean"  => …
  "SouthAsian"       => …
  "NorthernEuropean" => …
  "African"          => …
  "EastAsian"        => …
  "LatinAmerican"    => …
  "SouthernEuropean" => …
  "MiddleEastern"    => …
  "NorthAmerican"    => …
  "SoutheastAsian"   => …
  "WesternEuropean"  => …

Each entry is a sparse incidence matrix in which rows represent individual recipes and columns represent ingredients.

julia> sa = cs["SoutheastAsian"]
457×1530 SparseMatrixCSC{Int64,Int64} with 5172 stored entries:
  [2   ,    8]  =  1
  [7   ,    8]  =  1
  ⋮
  [325 , 1524]  =  1
  [332 , 1524]  =  1

Thus there are 547 Southeast Asian "recipes." To list the ingredients of recipe 7,

julia> iids[find(sa[7,:].==1),:]
14×2 Array{String,2}:
 "black_pepper"   "spice"       
 "garlic"         "vegetable"   
 "mint"           "herb"        
 "pineapple"      "fruit"       
 "rice"           "cereal/crop" 
 "shiitake"       "vegetable"   
 "basil"          "herb"        
 "cayenne"        "spice"       
 "fish"           "fish/seafood"
 "vegetable_oil"  "plant"       
 "lemon"          "fruit"       
 "tomato"         "vegetable"   
 "chicken"        "meat"        
 "oregano"        "herb" 

I was introduced to the general topic in a blog post, Data Analysis in the Kitchen, by the open access scientific publisher Frontiers. A perspective article explains the appeal:

[T]he perception of food is an incredibly complex process involving the chemical properties of aroma compounds, the biochemistry of receptor proteins, the physiology of the mouth, nose, and other organs, the neuroscience of the olfactory bulb and brain, and the psychology of multi-sensory associations and memories. Similar to many other research disciplines, the highly interdisciplinary study of food perception, consumption, and culture is being transformed by the ubiquity of data sets—on flavor chemistry and biochemistry, but also on food consumption, eating habits, and culinary diversity—and new data analysis methods, including network analysis and machine learning.--Mouritsen, Edwards-Stuart, Ahn and Ahnert, Food Perception, Preparation, Consumption, and Culture.