Ordered SPARQL queries should use Virtuoso's scrollable cursors
Closed this issue · 2 comments
All DPUs loading data via SPARQL queries using ORDER BY
(such as XSLT DPU) should use Virtuoso's scrollable cursors (see Virtuoso's documentation, section "Example: Prevent Limits of Sorted LIMIT/OFFSET query"). When OFFSET
in an ordered SPARQL query exceeds Virtuoso's setting MaxSortedTopRows
from virtuoso.ini
(typically set to 10-20K rows), the query fails with error message like the following:
Virtuoso 22023 Error SR353: Sorted TOP clause specifies more then 41000 rows to sort.
Only 40000 are allowed.
Either decrease the offset and/or row count or use a scrollable cursor
Temporary workaround for this issue is to increase the MaxSortedTopRows
setting, but the solution is to use a sub-SELECT with ORDER BY
wrapped in SELECT query with OFFSET
and LIMIT
. For example, the XSLT DPU uses the query:
SELECT ?s ?o
WHERE {
?s <http://linked.opendata.cz/ontology/odcs/xmlValue> ?o .
}
ORDER BY ?s ?o
This query with scrollable cursor that allows loading larger data could look like the following:
SELECT ?s ?o
WHERE {
{
SELECT ?s ?o
WHERE {
?s <http://linked.opendata.cz/ontology/odcs/xmlValue> ?o .
}
ORDER BY ?s ?o
}
}
# Pagination goes here:
LIMIT 10000
OFFSET 1000000
Scrollable cursors are implemented in LinkedPipes ETL components http://etl.linkedpipes.com/components/e-sparqlendpointselectscrollablecursor and http://etl.linkedpipes.com/components/e-sparqlendpointconstructscrollablecursor