JuliaML/MLDatasets.jl

[Discussion] Moving to HuggingFace for some databases

Opened this issue · 0 comments

Some of the (graph) databases that we are trying to support might have either of the following problems:

  1. Hosted in university servers or a non-trusted source which cannot provide proper download speeds though out the globe.
  2. Datasets that aren't hosted anywhere and come with a license
  3. Datasets stored as python formats.

HuggingFace has now good set of community maintained graph datasets. If we come across any of these above issues for a dataset, we can try to add these datasets to HF and then pull from HF and then process as required. This I believe will largely reduce code for integrating and testing new datasets. I am not sure about the planned support for https://github.com/FluxML/HuggingFaceApi.jl but this seems to me like a better idea than relying on links that can fail without warning.

cc: @CarloLucibello