vega/vega-dataflow

Support sampling rate instead of target sampling size

domoritz opened this issue · 2 comments

E.g. keep 20% of the tuples.

{"type": "sample", "rate": 0.2}

I guess that's just

{"type": "filter", "expr": "random() < 0.2"}
jheer commented

Careful! A filter with a randomized predicate can cause your sample to change if re-run (e.g., due to upstream operator changes). A safer approach might be to calculate the desired sample window size in a signal (e.g., 0.2 * length(data('source'))) and use that to parameterize a sample transform.