Mapping between biological process and hit gets lost
Opened this issue · 0 comments
Adafede commented
Hi, me again.
Since I classified a huge batch for downstream use I came across some potential issues:
When running for example
python3 classify.py -d ./ -m 'InChI=1S/C28H32O15/c1-39-14-7-15-18(12(32)6-13(40-15)10-2-4-11(31)5-3-10)22(35)19(14)26-27(24(37)21(34)16(8-29)41-26)43-28-25(38)23(36)20(33)17(9-30)42-28/h2-7,16-17,20-21,23-31,33-38H,8-9H2,1H3/t16-,17-,20-,21-,23+,24+,25-,26+,27-,28+/m1/s1' -j
{"molecule": "InChI=1S/C28H32O15/c1-39-14-7-15-18(12(32)6-13(40-15)10-2-4-11(31)5-3-10)22(35)19(14)26-27(24(37)21(34)16(8-29)41-26)43-28-25(38)23(36)20(33)17(9-30)42-28/h2-7,16-17,20-21,23-31,33-38H,8-9H2,1H3/t16-,17-,20-,21-,23+,24+,25-,26+,27-,28+/m1/s1", "ikey": "VGGSULWDCMWZPO-ODEMIOGVSA-N", "hits": [{"classification_names": ["flavonoid", "2-phenylchromane flavonoid", "flavone", "6C-substituted flavone", "6C-glycosylated flavone", "spinosin"], "biological_process": ["flavone biosynthetic process", "GO:0051553"]}]}
The ["flavone biosynthetic process", "GO:0051553"]
is actually not linked to all hits, but only one. The relationship is kept if it is a 1 to 1, but it is almost never the case (as illustrated here).