ultralytics/xview-yolov3

using the model with four GPUs

abidmalikwaterloo opened this issue · 8 comments

I am trying to run the model with 4 GPUs and get the following error:

222 layers, 6.26582e+07 parameters, 6.26582e+07 gradients
     Epoch     Batch         x         y         w         h      conf       cls     total         P         R       nGT        TP        FP        FN      time
/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/functional.py:52: UserWarning: size_average and reduce args will be deprecated, please use reduction='sum' instead.
  warnings.warn(warning.format(ret))
Traceback (most recent call last):
  File "train.py", line 209, in <module>
    main(opt)
  File "train.py", line 131, in main
    weight=class_weights, epoch=epoch)
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 123, in forward
    outputs = self.parallel_apply(replicas, inputs, kwargs)
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/parallel/data_parallel.py", line 133, in parallel_apply
    return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 77, in parallel_apply
    raise output
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/parallel/parallel_apply.py", line 53, in _worker
    output = module(*input, **kwargs)
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/gpfshome01/u/amalik/OGA/Yolov3/xview-yolov3/models.py", line 231, in forward
    x, *losses = module[0](x, targets, requestPrecision, weight, epoch)
  File "/sdcc/u/amalik/.local/lib/python3.5/site-packages/torch/nn/modules/module.py", line 477, in __call__
    result = self.forward(*input, **kwargs)
  File "/gpfshome01/u/amalik/OGA/Yolov3/xview-yolov3/models.py", line 155, in forward
    requestPrecision)
  File "/gpfshome01/u/amalik/OGA/Yolov3/xview-yolov3/utils/utils.py", line 195, in build_targets
    inter_area = torch.min(box1, box2).prod(2)
RuntimeError: Expected object of type torch.cuda.FloatTensor but found type torch.FloatTensor for argument #2 'other'

However, the model works when I use one GPU. Any comments!

@abidmalikwaterloo the code does not support multi-GPU yet unfortunately. I only have a single-GPU machine so I have not been able to debug this issue. If you come up with a solution please advise me, or submit a pull request.

Also, see ultralytics/yolov3#21 for details. https://github.com/ultralytics/yolov3 is the base repo that this repo was built off of, so when the issue gets fixed there I can port the solution here.

@glenn-jocher Thanks. Working on it.

@glenn-jocher did you try to distribute using MPI library? I am thinking of using Horovod for this. Do you think the problem we have now is due to torch internal distributed model and would not give us a problem when we will use the MPI framework,

I was able to parallelize the model using Horovod. It ran on 3 nodes. However, data is not being distributed. It should be divided among three nodes which will reduce the number of iterations per epoch. This is not happening.

From one of the example using ResNet-50 . with imagenet, Horovod is using the following to distribute the data:

kwargs = {'num_workers': 4, 'pin_memory': True} if args.cuda else {}
train_dataset = \
    datasets.ImageFolder(args.train_dir,
                         transform=transforms.Compose([
                             transforms.RandomResizedCrop(224),
                             transforms.RandomHorizontalFlip(),
                             transforms.ToTensor(),
                             transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                                  std=[0.229, 0.224, 0.225])
                         ]))
# Horovod: use DistributedSampler to partition data among workers. Manually specify
# `num_replicas=hvd.size()` and `rank=hvd.rank()`.
train_sampler = torch.utils.data.distributed.DistributedSampler(
    train_dataset, num_replicas=hvd.size(), rank=hvd.rank())
train_loader = torch.utils.data.DataLoader(
    train_dataset, batch_size=args.batch_size, sampler=train_sampler, **kwargs)

I am doing the following:

 # Get dataloader
    dataloader = ListDataset(train_path, batch_size=opt.batch_size, img_size=opt.img_size, targets_path=targets_path)

    #For Horovod
    kwargs = { 'num_workers':1, 'pin_memory':True} if cuda else {}
    train_sampler = torch.utils.data.distributed.DistributedSampler(dataloader, num_replicas=hvd.size(), rank=hvd.rank())
    train_loader = torch.utils.data.DataLoader( dataloader, batch_size=opt.batch_size, sampler=train_sampler, **kwargs)

Do you think this make sence?

I have not tried using those packages. There should be a way to natively use multi-GPU within pytorch. I have not had access to multi-GPU machines to debug however.

@glenn-jocher Any progress? I got some idea and would like to work on it. But like to know where the effort is. Do not want to spend time on the stuff you already did.

Sorry, still got the same 1 GPU machine here, I simply can't debug multi-GPU currently. If you come up with a solution, let me know! Thanks.

@abidmalikwaterloo this issue is resolved in our main YOLOv3 repository:
https://github.com/ultralytics/yolov3

Be advised that the https://github.com/ultralytics/xview-yolov3 repository is not under active development anymore. We recommend you use https://github.com/ultralytics/yolov3 instead.