Support filter pushdown
EnricoMi opened this issue · 0 comments
Projections (select columns) and filters (where clauses) in Spark performed on the DataFrame
provided by the connector can be pushed into the connector so that only relevant data are read from the Dgraph database. Otherwise (in current implementation), the entire Dgraph data is loaded into Spark first and then filtered and projected.
Filter pushdown allows to read only the relevant sub-graph, which improves read performances for tiny sub-graphs. There are various types of pushdowns available:
- select uids
- select predicates
- select values
- value type (including objectUid)
- select columns
Select Uids
Uids (subject
column) can be selected by where
operation on the DataFrame
using =
and isin
. Dgraph does not support <
or >
on uids.
Select Predicates
Predicates can be selected through select
on the wide nodes DataFrame
or where
on the predicate
column in all other DataFrame
s returned by the connector. The where
operation should support =
and isin
.
Select Values
Objects can be filtered with where
using equality =
.
Value Type (including objectUid
)
Object type can be filtered by where
and is not null
on typed DataFrame
s like typed triples and typed nodes. This would then filter for Dgraph predicates that have the respective type only.
Select Columns
The connector can only provide those columns of a returned DataFrame
that are actually used later by Spark. This reduces the amount of data transfered to Spark.