addtt/boiler-pytorch

Doesn't support multiple GPU's

Closed this issue · 1 comments

I tried using torch.nn.DataParallel in order to utilize multiple GPUs but faced the following error:

AttributeError: 'DataParallel' object has no attribute 'global_step'

addtt commented

Hi! Sorry, unfortunately I have stopped using this quite a while ago, and I never tried multi-GPU training with it. I am not going to add functionality to this package.

I think in your case the problem is that back then I added global_step in the model itself, see here. So as explained here your model has to subclass BaseModel or one of the already available subclasses. This is not possible if your model is a nn.DataParallel module. You can probably find a workaround to use DataParallel anyway, otherwise I suggest either

  1. forking this repo and modifying the package directly, or

  2. using lightning or ignite for the training machinery.

In case (2) you can still take modules from boilr if you find them useful of course, but the training and experiment management parts were quickly made for my personal projects and are not really meant for general use.