Batchwise optimization takes longer times than expected to run
Joao-L-S-Almeida opened this issue · 3 comments
Joao-L-S-Almeida commented
- Could the nested loop in
simulai/optimization/_optimization.py:473
be responsible ?- Could it be effective to replace it by a single loop ?
- Could it be effective to vectorize it via
torch.vmap
? - Could it be effective to compile the training model using the throughout
torch.compile
?
Joao-L-S-Almeida commented
torch.compile slightly reduces the execution time, but it is still does not solve the question.
Joao-L-S-Almeida commented
The data mapping between CPU and GPU can be another bottleneck. During a batch-wise optimization loop the datasets are usually stored in the principal memory and transferred to GPU in smaller chunks. It makes sense when we are using a single GPU card and its memory is very limited when compared to the dataset size, however, there are cases in which the dataset can be entirely allocated in GPU and it can save execution time. We just need to check this condition in the beginning of the fit
method.
Joao-L-S-Almeida commented
All these tests were done, but the gain was marginal. I'm closing the issue for now.