WDscholia/scholia

Current "related compounds" is ambiguous

Opened this issue · 2 comments

Is your feature request related to a problem? Please describe.
Currently, the compounds listed as having the same connectivity encompass a broad range of different things including isotopomers

Describe the solution you'd like
Either clarifying it or split them in subcategories.
If we split them, I am happy to rewrite the respective queries using InChI and not InChIKey to strip the different respective layers.
We could also make use of P3364 and P6185

Describe alternatives you've considered
Letting things as they are right now (but removing the "including the compound itself (see a52e3e7)

Additional context
Trying to improve the chemical aspect

@egonw

I need to think about this a bit more. I like the idea, but need to overthink the implications.

I also overthought again about it, and here is what came to my mind (WIP):

So keeping the same table but with an additional column, being "stereoisomer, isotopomer, etc." based on the matching layers:

PREFIX target: <http://www.wikidata.org/entity/Q41576>

# title: related chemical structures
SELECT ?mol ?molLabel ?InChI ?InChIKey ?CAS ?ChemSpider ?PubChem_CID ?layer_b ?layer_t ?layer_m ?layer_s WITH {
  SELECT ?queryKey ?srsearch ?filter WHERE {
    target: wdt:P235 ?queryKey .
    BIND(CONCAT(SUBSTR($queryKey,1,14), " haswbstatement:P235") AS ?srsearch)
    BIND(CONCAT("^", SUBSTR($queryKey,1,14)) AS ?filter)
  }
} AS %MOLS WITH {
  SELECT ?mol ?InChIKey WHERE {
    INCLUDE %MOLS
    SERVICE wikibase:mwapi {
        bd:serviceParam wikibase:endpoint "www.wikidata.org";
        wikibase:api "Search";
        mwapi:srsearch ?srsearch;
        mwapi:srlimit "max".
        ?mol wikibase:apiOutputItem mwapi:title.
      }
    ?mol wdt:P235 ?InChIKey .
    FILTER (REGEX(STR(?InChIKey), ?filter))
    FILTER (?InChIKey != ?queryKey)
  }
} AS %MOLS2 {
  INCLUDE %MOLS2
  ?mol wdt:P234 ?InChI .
  # WIP
  BIND(REPLACE(?InChI, "/{0}.*?/b", "/") AS ?layer_b)
  BIND(REPLACE(?InChI, "/{0}.*?/t", "/") AS ?layer_t)
  BIND(REPLACE(?InChI, "/{0}.*?/m", "/") AS ?layer_m)
  BIND(REPLACE(?InChI, "/{0}.*?/s", "/") AS ?layer_s)
  OPTIONAL { ?mol wdt:P231 ?CAS }
  OPTIONAL { ?mol wdt:P661 ?ChemSpider }
  OPTIONAL { ?mol wdt:P662 ?PubChem_CID }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}