Wimmics/corese

[Help] Optimization of a federated query

MaillPierre opened this issue · 3 comments

Hi,

I would like some pointers to optimize some federated queries sent with Corese.

I am currently trying to count the number of endpoints containing certain classes and properties. I have a list of endpoints and I load simple declarations of classes and properties in my corese server in the form of:

<https://w3id.org/fog#asShapefile-shp>
        rdf:type          rdf:Property ;
        rdfs:isDefinedBy  <https://w3id.org/fog> .

<http://lod.nl.go.kr/ontology/scale>
        rdf:type          rdf:Property ;
        rdfs:isDefinedBy  <http://lod.nl.go.kr/ontology/> .

To count their appearance, for each endpoint I send to the COrese server a query like the following:

prefix voaf: <http://purl.org/vocommons/voaf#>
prefix kgi: <http://ns.inria.fr/kg/index#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix prov: <http://www.w3.org/ns/prov#>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix void: <http://rdfs.org/ns/void#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>
INSERT {
    ?elem voaf:usageInDataset ?elemDatasetOccurence ;
       voaf:reusedByDatasets ?nbUsedByDatasets .
     ?elemDatasetOccurence a voaf:DatasetOccurences ;
         voaf:inDataset <endpoint URL>.
}
WHERE {
       
       # Selection of the property
       ?elem a rdf:Property .
        FILTER( isIRI(?elem))

        # Checking if the property is used in the endpoint
        SERVICE <endpoint URL> {
            FILTER( EXISTS {
                ?s ?elem ?o .
          } )
       }

        BIND( Iri( CONCAT( str(kgi:), MD5( CONCAT( STR(?elem) , STR(<endpoint URL>) ) ) ) ) AS ?elemDatasetOccurence )
        ?elem voaf:reusedByDatasets ?occurences .
       BIND( ?occurences + 1 AS ?nbUsedByDatasets )
}

My problem is that as an answer, I receive a great number of "Read timed out" errors with the following query sent to the remote SPARQL endpoint:

prefix kgi: <http://ns.inria.fr/kg/index#>
prefix sd: <http://www.w3.org/ns/sparql-service-description#>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix prov: <http://www.w3.org/ns/prov#>
prefix void: <http://rdfs.org/ns/void#>
prefix voaf: <http://purl.org/vocommons/voaf#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
select * 
where {
filter exists {?s ?elem ?o .} 
}
limit 1000 

Do you have any pointer to force the query engine to instantiate a variable with a value from the local dataset, i.e. to make sure that "?elem" is instantiated in the query sent to the remote endpoint?

ocorby commented

It solved the problem but it is very dependant on pattern ordering.

The following query works as advertised

prefix voaf: <http://purl.org/vocommons/voaf#>
prefix kgi: <http://ns.inria.fr/kg/index#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix prov: <http://www.w3.org/ns/prov#>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix void: <http://rdfs.org/ns/void#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

INSERT {
     ?elem voaf:usageInDataset ?elemDatasetOccurence ;
        voaf:reusedByDatasets ?nbUsedByDatasets .
} WHERE {
     ?elem a owl:Class .
     FILTER( isIRI(?elem))

    SERVICE <http://localhost:3030/LOV/sparql> {
    VALUES ?elem { undef }
    {
          ?elem a ?type .
           VALUES ?type {
               owl:Class
               rdfs:Class
           }
     } UNION {
         ?whatever a ?elem .
      }
   }

   ?elem voaf:reusedByDatasets ?occurences .
   BIND( ?occurences + 1 AS ?nbUsedByDatasets )
}

This next query in which I just moved 2 lines do not work as hoped

prefix voaf: <http://purl.org/vocommons/voaf#>
prefix kgi: <http://ns.inria.fr/kg/index#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>
prefix prov: <http://www.w3.org/ns/prov#>
prefix dcat: <http://www.w3.org/ns/dcat#>
prefix void: <http://rdfs.org/ns/void#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

INSERT {
     ?elem voaf:usageInDataset ?elemDatasetOccurence ;
        voaf:reusedByDatasets ?nbUsedByDatasets .
} WHERE {

    SERVICE <http://localhost:3030/LOV/sparql> {
    VALUES ?elem { undef }
    {
          ?elem a ?type .
           VALUES ?type {
               owl:Class
               rdfs:Class
           }
     } UNION {
         ?whatever a ?elem .
      }
   }

     ?elem a owl:Class .  # Line moved
     FILTER( isIRI(?elem))  # Line moved

   ?elem voaf:reusedByDatasets ?occurences .
   BIND( ?occurences + 1 AS ?nbUsedByDatasets )
}
ocorby commented