rdfjs/query-spec

Custom data indexes

Opened this issue · 1 comments

Something to keep in mind when developing filterable source is that SPARQL 1.1 is not readily capable of expressing all types of queries and this can get complicated when graph pattern matching has to be reconciled with arbitrary data indexes. Take the following exploratory work done on graphy for example which use indexes that are baked into the HDT file being queried:

k_store.pattern({
   '?place': {
      a: 'dbo:Place && (dbo:City || dbo:National_Park)',

      // use built-in data index "its:number" to perform numeric range filter
      dbo_population: '{its:number is > 100e3}',

      // use built-in text index "its:text" to perform string matching
      dbo_abstract: '{its:text contains /central/i}',

      // use registered custom data comparison algorithm to match terms whose contents are locale-dependent
      dbo_annual_cost: '{currency:worth is > $20m and <= $40m}',

      // use registered spatial index to solve topological query
      ago_footprint: '{ago:geometry is within ?state and contains ?park}',

      // use registered knn to find neighbors
      ago_centroid: '{ago:geometry closest 10 ?park}'
   },
})

Notice how the spatial queries cannot be adequately solved using a filter; for optimal performance, the query engine must be able to decide the order in which to solve the joins amongst the graph patterns and each of the topological queries.

If I understand correctly, the need is to be able to push down filters into sources that may span multiple quad patterns (or even more higher-level operations).

Currently, the FilterableSource interface provides a method to do something like source.matchExpression(s, p, o, g, filter).
Instead (or additionally), we need something like the following:

source.matchOperation(
  bgp(
    pattern(s1, p1, o1, g1),
    pattern(s2, p2, o2, g2),
  ),
  filter,
);

which would allow the resolution of the operation (a BGP in this case) together with the filter expression to be handled by the index.

This matchOperation method could not exist inside the realm of pure RDF quads anymore, so we'll have to return a bindings stream here instead of a quad stream.

This would make it quite similar to the current QueryableAlgebra interface from #7. So I'm wondering if that already meets these needs?