Pre-caching proposal

Question

Pre-caching proposal

philippjfr opened this issue 2 years ago · 2 comments

Pre-caching data is simple when the data is a manageable size and we can perform a single query to get all of it. However when the data is potentially huge and various filters and transforms are applied we need to find a different solution to caching individual queries. Since we now express all filters and transform at the level of a Pipeline this is also the appropriate place to declare the pre-caching configuration. There are two possibilities for expressing the space of queries to cache:

Express the values for each filter and potentially also options for specific transforms. We could then compute the cross-product of all of those options and execute and cache those queries. The problem with this approach is that cross-products quickly get huge and often we just need to pre-fetch the most commonly viewed combinations.
Express each explicit combination of filter, transform and variable values to fetch.

Both approaches have value but for now we will focus on the latter.

Specification

The proposed specification for defining individual pre-cached values looks like this:

pipelines:
  sql_table:
    - source: sql_source
      table: sql_table
      filters:
        a:
          type: widget
        b:
          type: widget
      sql_transforms:
        - type: aggregate
          by: $variables.groupby
          aggregates:
            c: mean
      precache:
        - a: 1
          b: 2
          variables:
            groupby: day
        - a: 2
          b: 1
          variables:
            groupby: week
        ...

To address option 1 we will eventually also support this syntax:

pipelines:
  sql_table:
    - source: sql_source
      table: sql_table
      filters:
        a:
          type: widget
        b:
          type: widget
      sql_transforms:
        - type: aggregate
          by: $variables.groupby
          aggregates:
            c: mean
      precache:
        a: [1, 2]
        b: [1, 2]
        variables:
            groupby: [day, week]

Answer 1 · 2022-11-16T13:33:15.000Z

This has been merged.

Answer 2 · 2023-07-11T06:08:55.000Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.