eMetaboHUB/Forum-DiseasesChem

Extract PMID supporting a relation

maximeDelmas opened this issue · 3 comments

Context

Currently, only the number of articles supporting the relation is displayed on the interface, in the column papers

Desired behaviour

As a user I would be able to extract the complete list of PMID that are supporting the relation

Solutions ?

Even if the complete list of PMID associated to each association is not stored in the SQL database (as it would involve a massive load of data), the information is encoded in the KG and could be extracted using SPARQL requests.

  • We could create URLs containing the SPARQL request allowing to extract PMIDs related to a specific PubChem/CheBI/ChemOnt-MeSH association as a GET/POST http request. This url could be added in a new column of the result table.

  • The request could also be part of the pre-filled SPARQL requests.

For the URL part, I will start to work on it. The idea is to provide a link to an http request that will send a request to the SPARQL endpoint to get the PMIDs associated to an assocition in the result table.
I think that for heavily supported associations (>1000 articles) we should set a limit in the number of articles returned when using the web interface as the SPARQL request can be really long. I will start to test some http requests.

To get all the PMID the user should use the SPARQL endpoint with the pre-filled SPARQL request. But, we have to discuss on how to implement pre-filled SPARQL request in the Virtuoso SPARQL endpoint with @ofilangi in a meeting soon.

Requests with ChEBI and ChemOnt can be very long, more than 5 minutes, which is incompatible with a request of the web interface ...

For PubChem - MeSH, it is more reasonable as the engine does not have to go through the chemical ontology...

I would also propose to pre-compute all the PMID sets, and if we can't store them in the database, we could provide the files on the ftp.

When specifying graphs using FROM attributes, it seems to increase requests' speed

Maybe we do not need the limit 1000 also