Pre-caching proposal
philippjfr opened this issue · 2 comments
Pre-caching data is simple when the data is a manageable size and we can perform a single query to get all of it. However when the data is potentially huge and various filters and transforms are applied we need to find a different solution to caching individual queries. Since we now express all filters and transform at the level of a Pipeline
this is also the appropriate place to declare the pre-caching configuration. There are two possibilities for expressing the space of queries to cache:
- Express the values for each filter and potentially also options for specific transforms. We could then compute the cross-product of all of those options and execute and cache those queries. The problem with this approach is that cross-products quickly get huge and often we just need to pre-fetch the most commonly viewed combinations.
- Express each explicit combination of filter, transform and variable values to fetch.
Both approaches have value but for now we will focus on the latter.
Specification
The proposed specification for defining individual pre-cached values looks like this:
pipelines:
sql_table:
- source: sql_source
table: sql_table
filters:
a:
type: widget
b:
type: widget
sql_transforms:
- type: aggregate
by: $variables.groupby
aggregates:
c: mean
precache:
- a: 1
b: 2
variables:
groupby: day
- a: 2
b: 1
variables:
groupby: week
...
To address option 1 we will eventually also support this syntax:
pipelines:
sql_table:
- source: sql_source
table: sql_table
filters:
a:
type: widget
b:
type: widget
sql_transforms:
- type: aggregate
by: $variables.groupby
aggregates:
c: mean
precache:
a: [1, 2]
b: [1, 2]
variables:
groupby: [day, week]
This has been merged.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.