Question about the kaggle run setting

Question

Question about the kaggle run setting

franchuterivera opened this issue 3 years ago · 2 comments

Hello!

Thanks a lot for providing this repository for the reproducibility of results. I have a question about the setup of the predictors.

In the paper, one can see that the machine employed for the runs has a pretty big memory in line with some of the big Kaggle datasets.

To ensure no AutoML framework is resource-limited, we ran the Kaggle benchmark for longer than the AutoML
datasets (4h and 8h time limits), and used more powerful AWS m5.24xlarge EC2 instances (384 GiB memory, 96 vCPU
cores).

If I look into the predictors, for instance autosklearn it looks like it is using 4 cores and dividing all of the virtual memory available among them, which makes me think each core of the got like 384/4 = 96 Gb for 4h/8h.

Is this a correct assumption? Thanks a lot for the clarification!

Answer 1 · 2021-06-28T20:52:54.000Z

I believe that is correct. This was the recommended configuration by the authors and was what was being used in AutoMLBenchmark at the time. Since Auto-Sklearn was unstable with num_jobs = -1 we decided not to use it for the runs. I'm unsure if this meant that only 1 core was being used per thread or that 4 threads were each using 1/4th of the cores.

Answer 2 · 2021-06-28T21:26:03.000Z

That makes a lot of sense. Thanks for the quick reply!