openml/OpenML

OpenML API parquet migration - Phase 2

prabhant opened this issue · 1 comments

We already have dataset download support in parquet and MinIO, now the next phase is uploading these datasets.

We need to allow parquet upload directly to MinIO. For this there are 3 components which are needed to be changed:

  • OpenML client APIs(python/R/Java): To convert dataset directly from dataframe to parquet and send an upload request.
  • OpenML API: Assign the uploaded dataset ID and then transfer it to the MinIO. (we already have scripts for transfer)
  • OpenML Evaluation engine, to process the parquet datasets.

@PGijsbers @joaquinvanschoren @janvanrijn

I created openml/openml-python#1141.
Can you elaborate on the new sequence of communication for uploading the dataset from a client API?
Are the new endpoints already available?

Assign the uploaded dataset ID and then transfer it to the MinIO.

Seems like the server will put the dataset in the MinIO bucket while

To convert dataset directly from dataframe to parquet and send an upload request.

makes it sound as though the client is expected to upload directly to the MinIO server.