SS-150: Chemical Compositon of a Mango
Closed this issue · 5 comments
Would be worth it to add food directly as SMILES for the chemical composition.
Arginine
Cysteine
Glycine
Isoleucine
Leucine
Lysine
Methionine
Phenylalanine
Proline
Serine
Threonine
Tryptophan
Tyrosine
Valine
Glutamic acid
Aspartic acid
Alanine
Arginine
Histidine
Palmitic acid
Stearic acid
Arachidic acid
Lignoceric acid
Oleic Acid
Linoleic acid
α-Linoleic acid
Myristic acid
Palmitic acid
Stearic acid
Arachidic acid
Behenic acid
Lignoceric acid
Palmitoleic acid
11-Hexadecenoic acid
10-Heptadecenoic acid
Oleic acid
11-Octadecenoic acid
11-Eicosenoic acid
9,12-Hexadecadienoic acid
Linoleic acid
9,15-Octadecadienoic acid
Hepta-2,4(E,E)-dienoic acid
Linolenic acid
Ascorbic acid
Thiamine
Riboflavin
Niacin
Pantothenic acid
Pyridoxine
Folic acid
Vitamin A
Vitamin E
Vitamin K
Gallic Acid
Vanillic acid
Syringic acid
Protocatechuic acid
p-hydroxybenzoic acid
p-coumaric acid
Chlorogenic acid
ferulic acid
caffeic acid
theogallin
quercetin-3-O-galactoside
quercetin-3-O-glucoside
quercetin-3-O-xyloside
Magniferin
cyanidin
delphinidin
pelargonidin
(+)-catechin
apigenin
luteolin
kaempferol
myricetin
9-cis-violaxanthin
limonene
Alpha-terpinolene
d-carvone
α-phellandrene
α-humulene
γ-terpinene
α-pinene
(−)-trans-caryophyllene
sabinene
(+)-3-carene
cis-caryophyllene
α-humulene
germacrene D
aromadendrene
β-cubebene
α-cubebene
α-bourbonene
β-elemene
Cool let's start here where are creating an object
in python. This is called a class object where we define the object as a Mango. The mango would have attributes like color or taste which will be functions. For the purpose of GlobalChem we are interested in the chemical composition so we have a function called get_smiles
:
class Mango(object):
@staticmethod
def get_smiles():
smiles = {
'arginine': 'C(CC(C(=O)O)N)CN=C(N)N',
'cysteine': '',
'glycine': '',
'isoleucine': '',
'leucine': '',
'lysine': '',
'methionine': '',
'phenylalanine': '',
'proline': '',
'serine': '',
'threonine': '',
'tryptophan': '',
'tyrosine': '',
'valine': '',
'glutamic acid': '',
'aspartic acid': '',
'alanine': '',
'arginine': '',
'histidine': '',
'palmitic acid': '',
'stearic acid ': '',
'arachidic acid': '',
'lignoceric acid': '',
'oleic acid': '',
'linoleic acid': '',
'alpha-Linoleic acid': '',
'myristic acid': '',
'palmitic acid': '',
'stearic acid': '',
'arachidic acid': '',
'behenic acid': '',
'lignoceric acid': '',
'palmitoleic acid': '',
'hexadecenoic acid': '',
'heptadecenoic acid': '',
'oleic acid': '',
'octadecenoic acid': '',
'eicosenoic acid': '',
'9,12-Hexadecadienoic acid': '',
'linoleic acid': '',
'9,15-Octadecadienoic acid': '',
'hepta-2,4(E,E)-dienoic acid': '',
'linolenic acid': '',
'ascorbic acid ': '',
'thiamine': '',
'riboflavin': '',
'niacin': '',
'pantothenic acid ': '',
'pyridoxine ': '',
'folic acid ': '',
'vitamin A': '',
'vitamin E': '',
'vitamin K': '',
'gallic Acid': '',
'vanillic acid ': '',
'syringic acid ': '',
'protocatechuic acid': '',
'para hydroxybenzoic acid ': '',
'paracoumaric acid': '',
'chlorogenic acid ':'',
'ferulic acid': '',
'caffeic acid': '',
'theogallin': '',
'quercetin-3-O-galactoside ': '',
'quercetin-3-O-glucoside ': '',
'quercetin-3-O-xyloside ': '',
'magniferin': '',
'cyanidin': '',
'delphinidin': '',
'pelargonidin': '',
'catechin': '',
'apigenin': '',
'luteolin': '',
'kaempferol': '',
'myricetin': '',
'9-cis-violaxanthin': '',
'limonene': '',
'alpha-terpinolene': '',
'd-carvone': '',
'alpha phellandrene': '',
'alpha humulene': '',
'gamma terpinene': '',
'alpha pinene': '',
'trans caryophyllene': '',
'sabinene': '',
'carene': '',
'cis-caryophyllene': '',
'αlpha humulene': '',
'germacrene d': '',
'aromadendrene': '',
'beta ubebene': '',
'alpha cubebene': '',
'alpha bourbonene': '',
'beta elemene': '',
}
return smiles
The next task for you to do @Nickspizza001 is to fill in the SMILES. To make it easier, you can search through to find if the name exists already. If you do find it, then add the SMILES here as an entry.
There might be bugs where there are multiple entries mapped to the same name and we should pick one and change the rest .
Note some rules I have applied when curating your list:
- Names should be lowercase
- Replace greek letters with their word equation.
- Try to find a replacement name for chemicals with numbers in them or we need to move them into it's equivalent word. We don't want to mix words and numbers which will make it harder for an AI to process later on.
Do this for all the entries. You have multiple entries so I think we can maybe sub classes of the Mango.
class Mango(object):
@staticmethod
def get_smiles():
smiles = {
'arginine': 'C(CC(C(=O)O)N)CN=C(N)N',
'cysteine': 'C([C@@H](C(=O)O)N)S',
'glycine': 'C(C(=O)O)N',
'isoleucine': 'CC[C@H](C)[C@@H](C(=O)O)N ',
'leucine': 'CC(C)C[C@@H](C(=O)O)N',
'lysine': 'C(CCN)C[C@@H](C(=O)O)N',
'methionine': 'CSCC[C@@H](C(=O)O)N',
'phenylalanine': 'C1=CC=C(C=C1)C[C@@H](C(=O)O)N',
'proline': 'C1C[C@H](NC1)C(=O)O',
'serine': 'C([C@@H](C(=O)O)N)O',
'threonine': 'C[C@H]([C@@H](C(=O)O)N)O',
'tryptophan': 'C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)N ',
'tyrosine': 'C1=CC(=CC=C1C[C@@H](C(=O)O)N)O ',
'valine': 'CC(C)[C@@H](C(=O)O)N',
'glutamic acid': 'C(CC(=O)O)[C@@H](C(=O)O)N',
'aspartic acid': 'C([C@@H](C(=O)O)N)C(=O)O',
'alanine': 'C[C@@H](C(=O)O)N',
'arginine': 'C(C[C@@H](C(=O)O)N)CN=C(N)N',
'histidine': 'C1=C(NC=N1)C[C@@H](C(=O)O)N',
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid ': 'CCCCCCCCCCCCCCCCCC(=O)O',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'alpha-Linoleic acid': '',
'myristic acid': 'CCCCCCCCCCCCCC(=O)O',
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid': 'CCCCCCCCCCCCCCCCCC(=O)O ',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'behenic acid': 'CCCCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'palmitoleic acid': 'CCCCCC/C=C\CCCCCCCC(=O)O',
'hexadecenoic acid': 'CCCC/C=C/CCCCCCCCCC(=O)O',
'heptadecenoic acid': 'CCCCCC/C=C/CCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'octadecenoic acid': 'CCCCCC/C=C/CCCCCCCCCC(=O)O',
'eicosenoic acid': 'CCCCCCCC/C=C\CCCCCCCCCC(=O)O',
'9,12-Hexadecadienoic acid': 'CCC/C=C/C/C=C/CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'9,15-Octadecadienoic acid': 'CC/C=C/CCCC/C=C/CCCCCCCC(=O)O',
'hepta-2,4(E,E)-dienoic acid': '',
'linolenic acid': 'CC/C=C\C/C=C\C/C=C\CCCCCCCC(=O)O',
'ascorbic acid ': 'C([C@@H]([C@@H]1C(=C(C(=O)O1)O)O)O)O',
'thiamine': 'CC1=C(SC=[N+]1CC2=CN=C(N=C2N)C)CCO',
'riboflavin': 'CC1=CC2=C(C=C1C)N(C3=NC(=O)NC(=O)C3=N2)C[C@@H]([C@@H]([C@@H](CO)O)O)O',
'niacin': 'C1=CC(=CN=C1)C(=O)O',
'pantothenic acid': 'CC(C)(CO)[C@H](C(=O)NCCC(=O)O)O',
'pyridoxine ': 'CC1=NC=C(C(=C1O)CO)CO',
'folic acid ': 'C1=CC(=CC=C1C(=O)N[C@@H](CCC(=O)O)C(=O)O)NCC2=CN=C3C(=N2)C(=O)NC(=N3)N',
'vitamin A': 'CC1=C(C(CCC1)(C)C)/C=C/C(=C/C=C/C(=C/CO)/C)/C',
'vitamin E': 'CC1=C(C2=C(CC[C@@](O2)(C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)C(=C1O)C)C',
'vitamin K': 'CC1=C(C(=O)C2=CC=CC=C2C1=O)C/C=C(\C)/CCCC(C)CCCC(C)CCCC(C)C',
'gallic Acid': 'C1=C(C=C(C(=C1O)O)O)C(=O)O',
'vanillic acid ': 'COC1=C(C=CC(=C1)C(=O)O)O',
'syringic acid ': 'COC1=CC(=CC(=C1O)OC)C(=O)O',
'protocatechuic acid': 'C1=CC(=C(C=C1C(=O)O)O)O',
'para hydroxybenzoic acid ': 'C1=CC(=CC=C1C(=O)O)O',
'paracoumaric acid': 'C1=CC(=CC=C1/C=C/C(=O)O)O',
'chlorogenic acid ':'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)/C=C/C2=CC(=C(C=C2)O)O)O)O',
'ferulic acid': 'COC1=C(C=CC(=C1)/C=C/C(=O)O)O',
'caffeic acid': 'C1=CC(=C(C=C1/C=C/C(=O)O)O)O',
'theogallin': 'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)C2=CC(=C(C(=C2)O)O)O)O)O',
'quercetin-3-O-galactoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-glucoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-xyloside': 'C1C(C(C(C(O1)OC2=C(OC3=CC(=CC(=C3C2=O)O)O)C4=CC(=C(C=C4)O)O)O)O)O',
'magniferin': 'C1=C2C(=CC(=C1O)O)OC3=C(C2=O)C(=C(C(=C3)O)[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O',
'cyanidin': 'C1=CC(=C(C=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O)O',
'delphinidin': 'C1=C(C=C(C(=C1O)O)O)C2=[O+]C3=CC(=CC(=C3C=C2O)O)O.[Cl-]',
'pelargonidin': 'C1=CC(=CC=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O',
'catechin': 'C1[C@@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O',
'apigenin': 'C1=CC(=CC=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O ',
'luteolin': 'C1=CC(=C(C=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O)O',
'kaempferol': 'C1=CC(=CC=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O',
'myricetin': 'C1=C(C=C(C(=C1O)O)O)C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O',
'9-cis-violaxanthin': 'C/C(=C\C=C\C=C(/C)\C=C\C=C(\C)/C=C/[C@]12[C@](O1)(C[C@H](CC2(C)C)O)C)/C=C/C=C(\C)/C=C/[C@]34[C@](O3)(C[C@H](CC4(C)C)O)C',
'limonene': 'CC1=CCC(CC1)C(=C)C',
'alpha-terpinolene': r"C/C(C)=C1CCC(C)C=C\1",
'd-carvone': 'CC1=CC[C@@H](CC1=O)C(=C)C',
'alpha phellandrene': '',
'alpha humulene': '',
'gamma terpinene': '',
'alpha pinene': '',
'trans caryophyllene': '',
'sabinene': '',
'carene': '',
'cis-caryophyllene': '',
'αlpha humulene': '',
'germacrene d': '',
'aromadendrene': '',
'beta ubebene': '',
'alpha cubebene': '',
'alpha bourbonene': '',
'beta elemene': '',
}
return smiles
class MangoAminoAcids(object):
def __init__(self):
self.name = 'mango_Amino_acids'
@staticmethod
def get_smiles():
smiles = {
'arginine': 'C(CC(C(=O)O)N)CN=C(N)N',
'cysteine': 'C([C@@H](C(=O)O)N)S',
'glycine': 'C(C(=O)O)N',
'isoleucine': 'CC[C@H](C)[C@@H](C(=O)O)N ',
'leucine': 'CC(C)C[C@@H](C(=O)O)N',
'lysine': 'C(CCN)C[C@@H](C(=O)O)N',
'methionine': 'CSCC[C@@H](C(=O)O)N',
'phenylalanine': 'C1=CC=C(C=C1)C[C@@H](C(=O)O)N',
'proline': 'C1C[C@H](NC1)C(=O)O',
'serine': 'C([C@@H](C(=O)O)N)O',
'threonine': 'C[C@H]([C@@H](C(=O)O)N)O',
'tryptophan': 'C1=CC=C2C(=C1)C(=CN2)C[C@@H](C(=O)O)N ',
'tyrosine': 'C1=CC(=CC=C1C[C@@H](C(=O)O)N)O ',
'valine': 'CC(C)[C@@H](C(=O)O)N',
'glutamic acid': 'C(CC(=O)O)[C@@H](C(=O)O)N',
'aspartic acid': 'C([C@@H](C(=O)O)N)C(=O)O',
'alanine': 'C[C@@H](C(=O)O)N',
'arginine': 'C(C[C@@H](C(=O)O)N)CN=C(N)N',
'histidine': 'C1=C(NC=N1)C[C@@H](C(=O)O)N'
}
return smiles
class MangoFattyAcids(object):
def __init__(self):
self.name = 'mango_fatty_acids'
@staticmethod
def get_smiles():
smiles = {
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid ': 'CCCCCCCCCCCCCCCCCC(=O)O',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'alpha-Linoleic acid': '',
'myristic acid': 'CCCCCCCCCCCCCC(=O)O',
'palmitic acid': 'CCCCCCCCCCCCCCCC(=O)O',
'stearic acid': 'CCCCCCCCCCCCCCCCCC(=O)O ',
'arachidic acid': 'CCCCCCCCCCCCCCCCCCCC(=O)O',
'behenic acid': 'CCCCCCCCCCCCCCCCCCCCCC(=O)O',
'lignoceric acid': 'CCCCCCCCCCCCCCCCCCCCCCCC(=O)O',
'palmitoleic acid': 'CCCCCC/C=C\CCCCCCCC(=O)O',
'hexadecenoic acid': 'CCCC/C=C/CCCCCCCCCC(=O)O',
'heptadecenoic acid': 'CCCCCC/C=C/CCCCCCCCC(=O)O',
'oleic acid': 'CCCCCCCC/C=C\CCCCCCCC(=O)O',
'octadecenoic acid': 'CCCCCC/C=C/CCCCCCCCCC(=O)O',
'eicosenoic acid': 'CCCCCCCC/C=C\CCCCCCCCCC(=O)O',
'9,12-Hexadecadienoic acid': 'CCC/C=C/C/C=C/CCCCCCCC(=O)O',
'linoleic acid': 'CCCCC/C=C\C/C=C\CCCCCCCC(=O)O',
'9,15-Octadecadienoic acid': 'CC/C=C/CCCC/C=C/CCCCCCCC(=O)O',
'hepta-2,4(E,E)-dienoic acid': '',
'linolenic acid': 'CC/C=C\C/C=C\C/C=C\CCCCCCCC(=O)O'
}
return smiles
class MangoFlavonoids(object):
def __init__(self):
self.name = 'mango_flavonoids'
@staticmethod
def get_smiles():
smiles = {
'quercetin-3-O-galactoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-glucoside': 'C1=CC(=C(C=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O)O',
'quercetin-3-O-xyloside': 'C1C(C(C(C(O1)OC2=C(OC3=CC(=CC(=C3C2=O)O)O)C4=CC(=C(C=C4)O)O)O)O)O',
'magniferin': 'C1=C2C(=CC(=C1O)O)OC3=C(C2=O)C(=C(C(=C3)O)[C@H]4[C@@H]([C@H]([C@@H]([C@H](O4)CO)O)O)O)O',
'cyanidin': 'C1=CC(=C(C=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O)O',
'delphinidin': 'C1=C(C=C(C(=C1O)O)O)C2=[O+]C3=CC(=CC(=C3C=C2O)O)O.[Cl-]',
'pelargonidin': 'C1=CC(=CC=C1C2=[O+]C3=CC(=CC(=C3C=C2O)O)O)O',
'catechin': 'C1[C@@H]([C@H](OC2=CC(=CC(=C21)O)O)C3=CC(=C(C=C3)O)O)O',
'apigenin': 'C1=CC(=CC=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O ',
'luteolin': 'C1=CC(=C(C=C1C2=CC(=O)C3=C(C=C(C=C3O2)O)O)O)O',
'kaempferol': 'C1=CC(=CC=C1C2=C(C(=O)C3=C(C=C(C=C3O2)O)O)O)O',
}
return smiles
class MangoPhenolicAcids(object):
def __init__(self):
self.name = 'mango_phenolic_acids'
@staticmethod
def get_smiles():
smiles = {
'gallic Acid': 'C1=C(C=C(C(=C1O)O)O)C(=O)O',
'vanillic acid ': 'COC1=C(C=CC(=C1)C(=O)O)O',
'syringic acid ': 'COC1=CC(=CC(=C1O)OC)C(=O)O',
'protocatechuic acid': 'C1=CC(=C(C=C1C(=O)O)O)O',
'para hydroxybenzoic acid ': 'C1=CC(=CC=C1C(=O)O)O',
'paracoumaric acid': 'C1=CC(=CC=C1/C=C/C(=O)O)O',
'chlorogenic acid ': 'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)/C=C/C2=CC(=C(C=C2)O)O)O)O',
'ferulic acid': 'COC1=C(C=CC(=C1)/C=C/C(=O)O)O',
'caffeic acid': 'C1=CC(=C(C=C1/C=C/C(=O)O)O)O',
'theogallin': 'C1[C@H]([C@H]([C@@H](C[C@@]1(C(=O)O)O)OC(=O)C2=CC(=C(C(=C2)O)O)O)O)O',
}
return smiles
class MangoVitamins(object):
def __init__(self):
self.name = 'mango_vitamins'
@staticmethod
def get_smiles():
smiles = {
'ascorbic acid ': 'C([C@@H]([C@@H]1C(=C(C(=O)O1)O)O)O)O',
'thiamine': 'CC1=C(SC=[N+]1CC2=CN=C(N=C2N)C)CCO',
'riboflavin': 'CC1=CC2=C(C=C1C)N(C3=NC(=O)NC(=O)C3=N2)C[C@@H]([C@@H]([C@@H](CO)O)O)O',
'niacin': 'C1=CC(=CN=C1)C(=O)O',
'pantothenic acid': 'CC(C)(CO)[C@H](C(=O)NCCC(=O)O)O',
'pyridoxine ': 'CC1=NC=C(C(=C1O)CO)CO',
'folic acid ': 'C1=CC(=CC=C1C(=O)N[C@@H](CCC(=O)O)C(=O)O)NCC2=CN=C3C(=N2)C(=O)NC(=N3)N',
'vitamin A': 'CC1=C(C(CCC1)(C)C)/C=C/C(=C/C=C/C(=C/CO)/C)/C',
'vitamin E': 'CC1=C(C2=C(CC[C@@](O2)(C)CCC[C@H](C)CCC[C@H](C)CCCC(C)C)C(=C1O)C)C',
'vitamin K': 'CC1=C(C(=O)C2=CC=CC=C2C1=O)C/C=C(\C)/CCCC(C)CCCC(C)CCCC(C)C',
}
return smiles
@Nickspizza001 Awesome! that is where I was headed in terms distribution of the chemicals and categorizing them. So next we are going to do a distribution.
Now we need to add your nodes to the knowledge graph:
You will see I created this file. Now we to need to determine where we are going to add your node into the knowledge graph. Do you think we should have a directory for fruit? and then another directory for mango.
food/mango/mango_amino_acids
Create a python file for each class object and for the name of the file: all lowercase and split words with a _
keyword and then add your path to the node here in this file:
You can copy some of my lines that I did there and also make the changes in here:
https://github.com/Global-Chem/global-chem/blob/development/global_chem/global_chem/global_chem.py
https://github.com/Global-Chem/global-chem/blob/development/global_chem/global_chem/__init__.py
What you are doing is adding a Node into the network. There is an algorithm that iterates through the directory structure and then builds the nodes in relation to all the other nodes. We then want to add your objects to the list:
Please read this article after you add and modify the files. We will be doing this together where we release a new version of the software with your Mango component and I did it for cannabis too with the subdirectories so it will go in the same release.
I think this issue now resolved in the New Release!