datonic/datadex

Publish Static Datasets

Closed this issue · 9 comments

We should publish datasets in multiple places

Also, publish via RoAPI.

Also, generate a Frictionless package (with Dagster) for the final datasets parquet files.

Would be nice to expose an static data api (url.com/dataset/partition/data.json) and perhaps some custom graphs at url.com/dataset/partition/?

Wow, I didn't know RoAPI, awesome!

+1 for parquet files.

I would wait duckdb become at least 1.0 to use it as a file format.

I think the DuckDB database could be pushed to Huggingface too!

https://huggingface.co/docs/huggingface_hub/en/guides/upload#upload-a-file

Maybe it is best to way to the first release version of duckdb. I head it will be soon. Meanwhile, I would upload a parquet.

So, datasets are being pushed to HuggingFace and also exposed in the HuggingFace API space with ROAPI!