openml/OpenML

(Presumably) Bug in data split for tasks with row_identifier column

sebffischer opened this issue · 1 comments

I am not sure whether this is actually the problem but I suspect it is.

The API docs say that the data-split is created according to the "row_id_attribute" if it is present.
So I was looking for a Dataset that has such a column and I found 210 (cloud).

When I am now trying to access the data-splits of the tasks that belong to this data-set (I tested around 5) I always get errors:

Error: Error downloading 'https://www.openml.org/api_splits/get/145795/Task
_145795_splits.arff' (http code: 412, oml code: NA, message: 'failed to per
form action generate_folds. Evaluation Engine result send to EMAIL_API_LOG
account.'

or alternatively:

Error: Error downloading 'https://www.openml.org/api_splits/get/190593/Task
_190593_splits.arff' (http code: 412, oml code: NA, message: 'Task not prov
iding datasplits.'

I presume that the problem is this row_identifier column but I am not sure

No, this is a general error with the splits generation... working on it