Wimmics/corese

Distributed query process is (or seems to be) missing

Closed this issue · 8 comments

Issue Description:

Distributed query feature is (or seems to be) missing from the latest corese distribution. If corese has abandoned this feature, I would be grateful if you could suggest a decent engine for SPARQL federation in transparent manner: without SERVICE subqueries.

Bug Details:

There are no GUI controls presented in http://sparks.i3s.unice.fr/public:kgram_dqp_alban_gaignard and I am unable to find corresponding REST endpoints.

Steps to Reproduce:

  1. Roll out corese from latest release.
  2. Try to use distributed query process according to documentation.

Expected Behavior:

Distributed querying is available and usable as described in documentation.

Actual Behavior:

Distributed query is (or seems to be) missing.

Note to Developers:

None

Screenshots/Attachments:

None

Hello,

Thank you for reaching out! The distributed query feature isn’t present in its original form, but you can achieve similar results using federated queries. Here are some quick steps:

In Code:

QueryProcess exec = QueryProcess.create(Graph.create());
Mappings map = exec.query("@federate <uri1endpoint1> <uri2endpoint2>\nselect * where {?x ?p ?y}\n");
// Print the list of results
for (Mapping m : map) {
    System.out.println(m);
}

In Corese-GUI:

Run:

@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}

You can also define and reuse a Federation:

@federation <federationuri> <uri1endpoint1> <uri2endpoint2>

@federation <federationuri>
select * where {?x ?p ?y}

You can also get the provenance of the results:

Add the @provenance keyword to the query:

@federate <uri1endpoint1> <uri2endpoint2>
@provenance
select * where {?x ?p ?y}

Documentation:

For additional information and examples, please refer to the documentation at Corese documentation and Federated Query.

Let me know if you have more questions.

Best

Thank you @remiceres for your quick reply! Your answer made federation usage clearer to me. Please help me with the following further questions:

  1. Does corese provide SPARQL endpoint for federated queries? If not, then should I use the Python wrapper to set it up and how to alleviate potential performance and scalability issues?
  2. Is it possible to define federation not in a query but elsewhere in the system to alleviate from clients management of federated URIs?

Using Federated Query with Corese-Server

Yes, it is possible to use the federated query with Corese-Server. Here's how to do it:

If your server runs locally and is not public

  1. Start the server with the following command:
java -jar corese-server-4.4.1.jar -p 8083 -su 
  • -p port : starts the server on port 8083 (default port is 8080).
  • -su : starts the server in superuser mode; this mode disables security checks on the server (it's not recommended for production use).
  1. Send federated queries to http://localhost:8083/sparql (same as before):
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}

If your server runs on a public server

If your server is hosted publicly, you can't use the -su option. Therefore, you need to create a profile to explicitly allow which endpoints can be used.

  1. Create a profile file (for example, profile.ttl) with the following content:
prefix st: <http://ns.inria.fr/sparql-template/> 

# List external endpoints allowed
st:access st:namespace
    <uri1endpoint1>,
    <uri2endpoint2>.
  1. Start the server with the following command:
java -jar corese-server-4.4.1.jar -p 8083 -pp profile.ttl
  • -p 8083 : starts the server on port 8083 (default port is 8080).
  • -pp profile.ttl : uses the personal profile file profile.ttl to grant access to the external endpoints.
  1. Send federated queries to http://localhost:8083/sparql (same as before):
@federate <uri1endpoint1> <uri2endpoint2>
select * where {?x ?p ?y}

Python wrapper

You can attempt to use the Python wrapper, but I wouldn’t recommend it at the moment. Currently, the Python wrapper is just a proof of concept, and I haven’t tested its performance and scalability. We plan to work on its improvement and test scalability in the future.

Defining Federation Outside a Query

Currently, I am unaware of how to define federation outside a query. I will get back to you if I find more information.

Useful links


If you are developing a program in Python and want to use federated queries, I recommend using Corese-Server and sending queries to it through Python code.

If you have more questions, please don't hesitate to ask.

Thank you for valuable information! I will try to implement it next week.
With regards to defining federation outside a query there is information in this article that seems to be about corese and looks promising:

  • In section 2: "... our implementation allows the description of a set of endpoints through the use of a dedicated vocabulary."
  • In section 6.1: "We defined a succinct vocabulary to declare in RDF a SPARQL federated query service. For instance the following RDF/Turtle configuration describes the X SPARQL federated query service, identified by http://e.g/X/sparql, that federates three SPARQL services.
prefix st:<http://e.g/sparql - template/>
<http://e.g./X/sparql> a st:Federation;
st:definition(
<http://a.b/blazegraph/Y/sparql>
<https://c.d/fuseki/annotation/sparql>
<http://i.j/repositories/sparql>
) .

In the corese source code there is "federation.ttl" file that seems to be the mentioned "dedicated vocabulary": https://github.com/Wimmics/corese/blob/0dc04a14cb19f7a58584153cdf295771837cc4d9/corese-core/src/main/resources/data/corese/federation.ttl

I have gathered more information on defining federation outside of a query. You can specify a federation in a configuration file and load it when starting Corese. Here are the steps:

Create a federation file, for instance, federation.ttl, and include the following content:

# Define a federations
<http://example.com/federation> a st:Federation ;
    rdfs:label "example" ;
    st:definition (
        <endpoint1> 
        <endpoint2>
    ).

Next, create a configuration file named, for example, config.properties, with the content below:

FEDERATION = /path/to/federation.ttl

Corese-Server:

Launch the server using the command below:

java -jar corese-server-4.4.1.jar -init config.properties

Then, send federated queries to http://localhost:8080/sparql :

@federation <http://example.com/federation>
select * where {
  ?x ?p ?y
}

Corese-GUI:

To start the GUI, use the following command:

java -jar corese-gui-4.4.1.jar -init config.properties

Then, execute the following query:

@federation <http://example.com/federation>
select * where {
  ?x ?p ?y
}

Corese-Command:

Initiate the command using the command below:

echo "" | java -jar corese-command-4.4.2.jar sparql -if turtle -q "@federation <http://example.com/federation> select * where {?x ?p ?y}" --init config.properties

The inclusion of echo "" and the -if turtle options are because the command is not intended to be used without input. It serves as a workaround.

This feature will be incorporated in the upcoming release of Corese-command (4.4.2). If you wish to utilize this feature now, you may compile the current version of Corese-command from the source code in the develop branch.


I am in the process of drafting a documentation page about this feature.

Do not hesitate to reach out if you have further questions.

@remiceres , thank you for providing exhaustive information so quickly. I am glad that this valuable feature is going to be released soon. I would be grateful if you could share the following suggestion with the development team. The suggestion is to support SPARQL 1.1 clients that don’t (can’t) add @federation statement to their queries. For example, packaged distributions with SPARQL 1.1 clients that can’t be easily modified to include @federation statement. I suggest to consider configuration option for corese that enables federation by default for all queries.

Thank you, @ocorby ! Now with your advice my requirements should be completely satisfied. I will try to set up the corese and provide feedback.