SemaGrow only seems to use 4 out of 5 configured SPARQL endpoints
natancox opened this issue · 4 comments
I have two simple queries and both only use 4 out of 5 endpoints. Not one (which I expected) or all 5 (explainable) but 4?
One ignores http://rdfstoreomv-on-1.vm.cumuli.be:3030/blazegraph/namespace/cbb/sparql endpoint and the other ignores http://rdfstoreomv-on-2.vm.cumuli.be:3030/rdfstoreomv/archive/query endpoint.
To give a bit more details. A simple query like
PREFIX qb: <http://purl.org/linked-data/cube#>
SELECT *
WHERE {
?x a qb:Observation.
}
renders this as execution plan.
Note: http://rdfstoreomv-on-2.vm.cumuli.be:3030/rdfstoreomv/archive/query is present!
Plan@local-semagrow[costs [20002.34168,0] 99 tuples]
Slice ( limit=100 )
Plan@local-semagrow[costs [20002.34168,0] 234168 tuples]
Union
Plan@local-semagrow[costs [15001.75626,0] 175626 tuples]
Union
Plan@local-semagrow[costs [10001.17084,0] 117084 tuples]
Union
Plan@local-semagrow[costs [5000.58542,0] 58542 tuples]
SourceQuery (source = http://data.vlaanderen.be/sparql)
Plan@http://data.vlaanderen.be/sparql[costs [58542,0] 58542 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_4cfead57_uri, value=http://purl.org/linked-data/cube#Observation, anonymous)
Plan@local-semagrow[costs [5000.58542,0] 58542 tuples]
SourceQuery (source = http://rdfstoreomv-on-2.vm.cumuli.be:3030/rdfstoreomv/archive/query)
Plan@http://rdfstoreomv-on-2.vm.cumuli.be:3030/rdfstoreomv/archive/query[costs [58542,0] 58542 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_4cfead57_uri, value=http://purl.org/linked-data/cube#Observation, anonymous)
Plan@local-semagrow[costs [5000.58542,0] 58542 tuples]
SourceQuery (source = http://data.kbodata.be/sparql)
Plan@http://data.kbodata.be/sparql[costs [58542,0] 58542 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_4cfead57_uri, value=http://purl.org/linked-data/cube#Observation, anonymous)
Plan@local-semagrow[costs [5000.58542,0] 58542 tuples]
SourceQuery (source = http://id.fedstats.be/sparql)
Plan@http://id.fedstats.be/sparql[costs [58542,0] 58542 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_4cfead57_uri, value=http://purl.org/linked-data/cube#Observation, anonymous)
which is strange because I have configure 5 endpoints.
If I do this query however
PREFIX milieu: <http://id.milieuinfo.be/def#>
SELECT *
WHERE {
?x a milieu:Exploitant.
}
I get a different set of endpoints that are queried.
Note: http://rdfstoreomv-on-2.vm.cumuli.be:3030/rdfstoreomv/archive/query is NOT present!
Plan@local-semagrow[costs [20003.87500,0] 99 tuples]
Slice ( limit=100 )
Plan@local-semagrow[costs [20003.87500,0] 387500 tuples]
Union
Plan@local-semagrow[costs [15002.90625,0] 290625 tuples]
Union
Plan@local-semagrow[costs [10001.93750,0] 193750 tuples]
Union
Plan@local-semagrow[costs [5000.96875,0] 96875 tuples]
SourceQuery (source = http://data.vlaanderen.be/sparql)
Plan@http://data.vlaanderen.be/sparql[costs [96875,0] 96875 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_135bf350_uri, value=http://id.milieuinfo.be/def#Exploitant, anonymous)
Plan@local-semagrow[costs [5000.96875,0] 96875 tuples]
SourceQuery (source = http://rdfstoreomv-on-1.vm.cumuli.be:3030/blazegraph/namespace/cbb/sparql)
Plan@http://rdfstoreomv-on-1.vm.cumuli.be:3030/blazegraph/namespace/cbb/sparql[costs [96875,0] 96875 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_135bf350_uri, value=http://id.milieuinfo.be/def#Exploitant, anonymous)
Plan@local-semagrow[costs [5000.96875,0] 96875 tuples]
SourceQuery (source = http://data.kbodata.be/sparql)
Plan@http://data.kbodata.be/sparql[costs [96875,0] 96875 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_135bf350_uri, value=http://id.milieuinfo.be/def#Exploitant, anonymous)
Plan@local-semagrow[costs [5000.96875,0] 96875 tuples]
SourceQuery (source = http://id.fedstats.be/sparql)
Plan@http://id.fedstats.be/sparql[costs [96875,0] 96875 tuples]
StatementPattern
Var (name=x)
Var (name=_const_f5e5585a_uri, value=http://www.w3.org/1999/02/22-rdf-syntax-ns#type, anonymous)
Var (name=_const_135bf350_uri, value=http://id.milieuinfo.be/def#Exploitant, anonymous)
Hi @natancox,
Semagrow performs an ASK query prior the query planning and prunes sources that does not seem to satisfy the query resulting, hopefully, to a more efficient plan. Can you check if this is the case by issuing in every configured SPARQL endpoint separately
PREFIX qb: <http://purl.org/linked-data/cube#>
ASK { ?x a qb:Observation. }
and
PREFIX milieu: <http://id.milieuinfo.be/def#>
ASK { ?x a milieu:Exploitant. }
Thanks for the quick reply. Smart move of checking the endpoint individually. Three of the endpoints are public so it should be easy to check.
And I noticed 2 out of 3 seem to be offline. The other is probably always returning HTML. I will try to get them behaving nicely before I bother you again.
Hi @natancox,
I'll close the bug for now, but please feel free to get back to us if you still have problems.
s
Hello @stasinos, it seems I am still having an issue. I will create a separate bug-report for it. But for completeness I will list my investigations.
Some more checks I did. I tested all endpoints and added, as you suggested
?query=PREFIX%20qb%3A%20%3Chttp%3A%2F%2Fpurl.org%2Flinked-data%2Fcube%23%3E%20%0AASK%20%7B%20%3Fx%20a%20qb%3AObservation.%20%7D
to each of the endpoints.
1) http://rdfstoreomv-on-1.vm.cumuli.be:3030/blazegraph/namespace/cbb/sparql
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
<head>
</head>
<boolean>false</boolean>
</sparql>
And is indeed being ignored.
** 2) http://id.fedstats.be/sparql **
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>406 Not Acceptable</title>
</head><body>
<h1>406 Not Acceptable</h1>
<p>An appropriate representation of the requested resource sparql could not be found on this server.</p>
Available variant(s):
<ul>
<li><a href="sparql">sparql</a> , type text/html, charset UTF-8</li>
</ul>
</body></html>
Seems to be an old Virtuoso instance.
** 3) http://rdfstoreomv-on-1.vm.cumuli.be:3030/blazegraph/namespace/lne/sparql **
Seems to be ok.
<?xml version='1.0' encoding='UTF-8'?>
<sparql xmlns='http://www.w3.org/2005/sparql-results#'>
<head>
</head>
<boolean>true</boolean>
</sparql>
** 4) http://data.kbodata.be/sparql **
404 Resource not found
So, again, not a bug.
** 5) http://data.vlaanderen.be/sparql **
Returns a full webpage and is by default not machine readable!