Support sampling rate instead of target sampling size
domoritz opened this issue · 2 comments
domoritz commented
E.g. keep 20% of the tuples.
{"type": "sample", "rate": 0.2}
domoritz commented
I guess that's just
{"type": "filter", "expr": "random() < 0.2"}
jheer commented
Careful! A filter with a randomized predicate can cause your sample to change if re-run (e.g., due to upstream operator changes). A safer approach might be to calculate the desired sample window size in a signal (e.g., 0.2 * length(data('source'))
) and use that to parameterize a sample
transform.