IBM/simulai

Batchwise optimization takes longer times than expected to run

Joao-L-S-Almeida opened this issue · 3 comments

  • Could the nested loop in simulai/optimization/_optimization.py:473 be responsible ?
    • Could it be effective to replace it by a single loop ?
    • Could it be effective to vectorize it via torch.vmap ?
    • Could it be effective to compile the training model using the throughout torch.compile ?

torch.compile slightly reduces the execution time, but it is still does not solve the question.

The data mapping between CPU and GPU can be another bottleneck. During a batch-wise optimization loop the datasets are usually stored in the principal memory and transferred to GPU in smaller chunks. It makes sense when we are using a single GPU card and its memory is very limited when compared to the dataset size, however, there are cases in which the dataset can be entirely allocated in GPU and it can save execution time. We just need to check this condition in the beginning of the fit method.

All these tests were done, but the gain was marginal. I'm closing the issue for now.