Distributed query process is (or seems to be) missing
Closed this issue · 8 comments
Issue Description:
Distributed query feature is (or seems to be) missing from the latest corese distribution. If corese has abandoned this feature, I would be grateful if you could suggest a decent engine for SPARQL federation in transparent manner: without SERVICE subqueries.
Bug Details:
There are no GUI controls presented in http://sparks.i3s.unice.fr/public:kgram_dqp_alban_gaignard and I am unable to find corresponding REST endpoints.
Steps to Reproduce:
- Roll out corese from latest release.
- Try to use distributed query process according to documentation.
Expected Behavior:
Distributed querying is available and usable as described in documentation.
Actual Behavior:
Distributed query is (or seems to be) missing.
Note to Developers:
None
Screenshots/Attachments:
None
Hello,
Thank you for reaching out! The distributed query feature isn’t present in its original form, but you can achieve similar results using federated queries. Here are some quick steps:
In Code:
QueryProcess exec = QueryProcess.create(Graph.create());
Mappings map = exec.query("@federate <uri1endpoint1> <uri2endpoint2>\nselect * where {?x ?p ?y}\n");
// Print the list of results
for (Mapping m : map) {
System.out.println(m);
}
In Corese-GUI:
Run:
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}
You can also define and reuse a Federation:
@federation <federationuri> <uri1endpoint1> <uri2endpoint2>
@federation <federationuri>
select * where {?x ?p ?y}
You can also get the provenance of the results:
Add the @provenance
keyword to the query:
@federate <uri1endpoint1> <uri2endpoint2>
@provenance
select * where {?x ?p ?y}
Documentation:
For additional information and examples, please refer to the documentation at Corese documentation and Federated Query.
Let me know if you have more questions.
Best
Thank you @remiceres for your quick reply! Your answer made federation usage clearer to me. Please help me with the following further questions:
- Does corese provide SPARQL endpoint for federated queries? If not, then should I use the Python wrapper to set it up and how to alleviate potential performance and scalability issues?
- Is it possible to define federation not in a query but elsewhere in the system to alleviate from clients management of federated URIs?
Using Federated Query with Corese-Server
Yes, it is possible to use the federated query with Corese-Server. Here's how to do it:
If your server runs locally and is not public
- Start the server with the following command:
java -jar corese-server-4.4.1.jar -p 8083 -su
-p port
: starts the server on port 8083 (default port is 8080).-su
: starts the server in superuser mode; this mode disables security checks on the server (it's not recommended for production use).
- Send federated queries to http://localhost:8083/sparql (same as before):
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}
If your server runs on a public server
If your server is hosted publicly, you can't use the -su
option. Therefore, you need to create a profile to explicitly allow which endpoints can be used.
- Create a profile file (for example,
profile.ttl
) with the following content:
prefix st: <http://ns.inria.fr/sparql-template/>
# List external endpoints allowed
st:access st:namespace
<uri1endpoint1>,
<uri2endpoint2>.
- Start the server with the following command:
java -jar corese-server-4.4.1.jar -p 8083 -pp profile.ttl
-p 8083
: starts the server on port 8083 (default port is 8080).-pp profile.ttl
: uses the personal profile fileprofile.ttl
to grant access to the external endpoints.
- Send federated queries to
http://localhost:8083/sparql
(same as before):
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}
Python wrapper
You can attempt to use the Python wrapper, but I wouldn’t recommend it at the moment. Currently, the Python wrapper is just a proof of concept, and I haven’t tested its performance and scalability. We plan to work on its improvement and test scalability in the future.
Defining Federation Outside a Query
Currently, I am unaware of how to define federation outside a query. I will get back to you if I find more information.
Useful links
- How to create a profile file : Getting started with Corese-Serveur
- How to use Python to send queries to Corese-Server : Python wrapper
- Python Wrapper : Corese Python
If you are developing a program in Python and want to use federated queries, I recommend using Corese-Server and sending queries to it through Python code.
If you have more questions, please don't hesitate to ask.
Thank you for valuable information! I will try to implement it next week.
With regards to defining federation outside a query there is information in this article that seems to be about corese and looks promising:
- In section 2: "... our implementation allows the description of a set of endpoints through the use of a dedicated vocabulary."
- In section 6.1: "We defined a succinct vocabulary to declare in RDF a SPARQL federated query service. For instance the following RDF/Turtle configuration describes the X SPARQL federated query service, identified by http://e.g/X/sparql, that federates three SPARQL services.
prefix st:<http://e.g/sparql - template/>
<http://e.g./X/sparql> a st:Federation;
st:definition(
<http://a.b/blazegraph/Y/sparql>
<https://c.d/fuseki/annotation/sparql>
<http://i.j/repositories/sparql>
) .
In the corese source code there is "federation.ttl" file that seems to be the mentioned "dedicated vocabulary": https://github.com/Wimmics/corese/blob/0dc04a14cb19f7a58584153cdf295771837cc4d9/corese-core/src/main/resources/data/corese/federation.ttl
I have gathered more information on defining federation outside of a query. You can specify a federation in a configuration file and load it when starting Corese. Here are the steps:
Create a federation file, for instance, federation.ttl
, and include the following content:
# Define a federations
<http://example.com/federation> a st:Federation ;
rdfs:label "example" ;
st:definition (
<endpoint1>
<endpoint2>
).
Next, create a configuration file named, for example, config.properties
, with the content below:
FEDERATION = /path/to/federation.ttl
Corese-Server:
Launch the server using the command below:
java -jar corese-server-4.4.1.jar -init config.properties
Then, send federated queries to http://localhost:8080/sparql :
@federation <http://example.com/federation>
select * where {
?x ?p ?y
}
Corese-GUI:
To start the GUI, use the following command:
java -jar corese-gui-4.4.1.jar -init config.properties
Then, execute the following query:
@federation <http://example.com/federation>
select * where {
?x ?p ?y
}
Corese-Command:
Initiate the command using the command below:
echo "" | java -jar corese-command-4.4.2.jar sparql -if turtle -q "@federation <http://example.com/federation> select * where {?x ?p ?y}" --init config.properties
The inclusion of
echo ""
and the-if turtle
options are because the command is not intended to be used without input. It serves as a workaround.
This feature will be incorporated in the upcoming release of Corese-command (4.4.2). If you wish to utilize this feature now, you may compile the current version of Corese-command from the source code in the develop branch.
I am in the process of drafting a documentation page about this feature.
Do not hesitate to reach out if you have further questions.
@remiceres , thank you for providing exhaustive information so quickly. I am glad that this valuable feature is going to be released soon. I would be grateful if you could share the following suggestion with the development team. The suggestion is to support SPARQL 1.1 clients that don’t (can’t) add @federation statement to their queries. For example, packaged distributions with SPARQL 1.1 clients that can’t be easily modified to include @federation statement. I suggest to consider configuration option for corese that enables federation by default for all queries.