openml/OpenML

Cannot create new benchmarking suite

mfeurer opened this issue · 0 comments

Hi, I tried to create a benchmarking suite that contains the "training datasets" from the Auto-sklearn 2.0 paper, which are 208 datasets that can be used for meta-learning which have no overlap with the often-used AutoML benchmark suite. However, when doing so I receive an error that the alias already exists, but I cannot find one with the same alias, which makes me believe this is a bug. The error is:

  File "/home/feurerm/sync_dir/projects/openml/python/local/bug3.py", line 38, in <module>
    suite.publish()
  File "/home/feurerm/sync_dir/projects/openml/python/openml/base.py", line 131, in publish
    call, "post", file_elements=file_elements
  File "/home/feurerm/sync_dir/projects/openml/python/openml/_api_calls.py", line 65, in _perform_api_call
    response = _read_url_files(url, data=data, file_elements=file_elements)
  File "/home/feurerm/sync_dir/projects/openml/python/openml/_api_calls.py", line 197, in _read_url_files
    response = _send_request(request_method="post", url=url, data=data, files=file_elements,)
  File "/home/feurerm/sync_dir/projects/openml/python/openml/_api_calls.py", line 237, in _send_request
    __check_response(response=response, url=url, file_elements=files)
  File "/home/feurerm/sync_dir/projects/openml/python/openml/_api_calls.py", line 284, in __check_response
    raise __parse_server_exception(response, url, file_elements=file_elements)
openml.exceptions.OpenMLServerException: https://www.openml.org/api/v1/study/ returned code 1038: Study alias not unique - None

Process finished with exit code 1

which is triggered by:

import openml

automl_metadata = [
    232, 236, 241, 245, 253, 254, 256, 258, 260, 262, 267, 271, 273, 275, 279, 288, 336, 340, 2119, 2120, 2121, 2122,
    2123, 2125, 2356, 3044, 3047, 3048, 3049, 3053, 3054, 3055, 75089, 75092, 75093, 75098, 75100, 75108, 75109, 75112,
    75114, 75115, 75116, 75118, 75120, 75121, 75125, 75126, 75129, 75131, 75133, 75134, 75136, 75139, 75141, 75142,
    75143, 75146, 75147, 75148, 75149, 75153, 75154, 75156, 75157, 75159, 75161, 75163, 75166, 75169, 75171, 75173,
    75174, 75176, 75178, 75179, 75180, 75184, 75185, 75187, 75192, 75195, 75196, 75199, 75210, 75212, 75213, 75215,
    75217, 75219, 75221, 75223, 75225, 75232, 75233, 75234, 75235, 75236, 75237, 75239, 75250, 126021, 126024, 126028,
    126030, 126031, 146574, 146575, 146576, 146577, 146578, 146583, 146586, 146592, 146593, 146594, 146596, 146597,
    146600, 146601, 146602, 146603, 146679, 166859, 166866, 166872, 166875, 166882, 166897, 166905, 166906, 166913,
    166915, 166931, 166932, 166944, 166950, 166951, 166953, 166956, 166957, 166958, 166959, 166970, 166996, 167085,
    167086, 167087, 167088, 167089, 167090, 167094, 167096, 167097, 167099, 167100, 167101, 167103, 167105, 167106,
    167202, 167203, 167204, 167205, 168785, 168791, 189779, 189786, 189828, 189829, 189836, 189840, 189841, 189843,
    189844, 189845, 189846, 189858, 189859, 189863, 189864, 189869, 189870, 189875, 189878, 189880, 189881,
    189882, 189883, 189884, 189887, 189890, 189893, 189894, 189899, 189900, 189902, 190154, 190155, 190156, 190157,
    190158, 190159, 211720, 211721, 211722, 211723, 211724
]

suite = openml.study.create_benchmark_suite(
    name="AutoML Benchmark Training Datasets",
    description="""A complimentary set of tasks to the AutoML benchmark that can be used as a training set for meta-learning as suggested by Feurer et al. in the paper "Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning".
    
A full description of the inclusion criteria can be found at in the paper at https://arxiv.org/abs/2007.04074

If you use this work in a publication please cite:

@article{feurer-arxiv21a,
    title = {Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning},
    author = {Matthias Feurer and Katharina Eggensperger and Stefan Falkner and Marius Lindauer and Frank Hutter},
    journal = {arXiv:2007.04074 [cs.LG]},
    year = {2021}
}
""",
    task_ids=automl_metadata,
    alias="AutoML-Benchmark-Train",
)
suite.publish()
print(suite)

Not sure if this is related, but trying to figure out which suites exist I only managed to find four of them via the API: openml.org/api/v1/study/list