Innixma/autogluon-benchmarking

Question about the kaggle run setting

franchuterivera opened this issue · 2 comments

Hello!

Thanks a lot for providing this repository for the reproducibility of results. I have a question about the setup of the predictors.

In the paper, one can see that the machine employed for the runs has a pretty big memory in line with some of the big Kaggle datasets.

To ensure no AutoML framework is resource-limited, we ran the Kaggle benchmark for longer than the AutoML
datasets (4h and 8h time limits), and used more powerful AWS m5.24xlarge EC2 instances (384 GiB memory, 96 vCPU
cores).

If I look into the predictors, for instance autosklearn it looks like it is using 4 cores and dividing all of the virtual memory available among them, which makes me think each core of the got like 384/4 = 96 Gb for 4h/8h.

Is this a correct assumption? Thanks a lot for the clarification!

I believe that is correct. This was the recommended configuration by the authors and was what was being used in AutoMLBenchmark at the time. Since Auto-Sklearn was unstable with num_jobs = -1 we decided not to use it for the runs. I'm unsure if this meant that only 1 core was being used per thread or that 4 threads were each using 1/4th of the cores.

That makes a lot of sense. Thanks for the quick reply!