Feature Request: DO downloads should include sample option
Closed this issue · 1 comments
DO Dataset downloads can be pretty big and oftentimes people don't want the whole thing. Here's an example:
To enable this, it'd be great to add filters and sampling options, like this:
# download 10 percent, randomly sampled
ds_sample = dataset.to_dataframe(sample_frac=0.1)
dataset.to_csv(sample_frac=0.1)
# download 1k records
ds_sample = dataset.to_dataframe(n_rows=1000)
dataset.to_csv(n_rows=1000)
Going further, it'd be really wonderful to apply filters just like we do for enrichment so that we could get only the data we need (e.g., by category, numeric range, geographic area, etc.)
Pandas has a method for sampling: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html
and filtering: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html
This could be extended to selecting only some columns in addition to applying filters to select only some rows.
cc @cmongut
This can be already done by using the new param sql_query
to filter the dataset.