Optimise clustering of event store
cortadocodes opened this issue · 0 comments
Feature request
Use Case
We need to decide which fields to cluster on in the BigQuery event store and whether to pull the event kind out as a column.
Current state
The event kind is stored in the event
JSON field and is queryable but cannot be ordered by (I don't think we need to order by it). We're currently clustering on ["sender", "question_uuid"]
in that order. Clustering is order-dependent on the filtered fields and must include the fields of higher priority (to the left) of a clustered field to take advantage of the clustering.
@thclark says: "We’d need to cluster on event_kind otherwise you’d have to process (for example) all the log rows every time you want to query for input or output values (remember it’s column based storage so the filters aren’t like conventional SQL, it’ll process all rows in order to apply a filter). Also, regardless of clustering I think (??) it may be more efficient to filter directly on a column than on a JSONField."
Proposed Solution
Discuss and choose:
- Whether to pull the event kind out as a field
- The fields to cluster on and in what order