SQL query performance
tpluscode opened this issue · 4 comments
Hi
I've just tried to run some more complex queries and I'm concerned with the SQL produced.
First issue I see is that each property object map generates a unioned SELECT. Wouldn't it be possible to generate a single SQL SELECT and transform the columns into results accordingly?
Then the number of SELECTs seems to explode for each triple pattern in SPARQL.
And lastly when I added a second triple pattern joined by a ref-object map the SQL literally explodes. In my query I select the main triples for a subject and the next level to include a blank node resulting from the second triple map. Normally I would use OPTIONAL though it's not yet implemented.
CONSTRUCT
{
@subject ?p ?o .
?blank ?p2 ?o1 .
}
WHERE
{
@subject ?p ?o .
@subject ?p1 ?blank .
?blank ?p2 ?o1 .
}
Such query executed against r2rml with 2 triple maps each with 9 and 1 predicate object map respectively produces and SQL query already 4700 lines long. Add an rdf:type pattern and it's another 2500 lines more. With a more complex triple pattern I got a whopping 14k lines.
You actually hit the worst case for my solution :)
I will try to explain why it is working this way. Now, it is not possible to generate a single select, because I can't be sure that the result will be correct. I need database information, that the join uses the same table and at least one of the joining columns is unique (will be done as a part of issue #6). Otherwise, you need to make a complete join.
The biggest problem of your query is, that there is no info what to join. I do not see the R2RML mapping, but I expect that the algorithm has no way to decide which subject-predicate-object mapping can be the source. So the optimizing mechanism has not any chance to optimize the query and it generates the full query. Maybe, in the future I can try to detect these scenarios and in that case I should not expand the unions over joins.
Can you please provide the database schema and r2rml file?
In the current version, I do not support custom SQL expression as logical table. I am not able to perform any optimizations upon them. I will support them in some later version. Maybe I will need to extend the mapping file a little bit. However, if you will use just tables directly, you will see that the performance is improved. Now, I will work on the support of other SPARQL queries than just basic triples.