cleinc/bts

ValueError: batch_size should be a positive integer value, but got batch_size=0

rnlee1998 opened this issue · 3 comments

when I run python bts.main.py arguments_train_nyu.txt ,I got ValueError: batch_size should be a positive integer value, but got batch_size=0 .what should I do ?

  File "bts_main_FAM.py", line 613, in <module>
    main()
  File "bts_main_FAM.py", line 607, in main
    mp.spawn(main_worker, nprocs=ngpus_per_node, args=(ngpus_per_node, args))
  File "/data2/liran/anaconda3/envs/torch1.4/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 171, in spawn
    while not spawn_context.join():
  File "/data2/liran/anaconda3/envs/torch1.4/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 118, in join
    raise Exception(msg)
Exception: 

-- Process 7 terminated with the following error:
Traceback (most recent call last):
  File "/data2/liran/anaconda3/envs/torch1.4/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/data2/liran/workspace/bts/pytorch/bts_main_FAM.py", line 405, in main_worker
    dataloader = BtsDataLoader(args, 'train')
  File "/data2/liran/workspace/bts/pytorch/bts_dataloader.py", line 56, in __init__
    sampler=self.train_sampler)
  File "/data2/liran/anaconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 219, in __init__
    batch_sampler = BatchSampler(sampler, batch_size, drop_last)
  File "/data2/liran/anaconda3/envs/torch1.4/lib/python3.7/site-packages/torch/utils/data/sampler.py", line 190, in __init__
    "but got batch_size={}".format(batch_size))
ValueError: batch_size should be a positive integer value, but got batch_size=0```

In bts_main.py you'll notice a line:
args.batch_size = int(args.batch_size / ngpus_per_node)

My guess is that your batch_size is smaller than ngpus_per_node. Since int() rounds to the floor, your batch_size = 0.
For example:
batch_size = 3
ngpus_per_node = 4
int(3/4) = 0

Try increasing your batch_size and maybe using multiples of your ngpus_per_node.

In bts_main.py you'll notice a line:
args.batch_size = int(args.batch_size / ngpus_per_node)

My guess is that your batch_size is smaller than ngpus_per_node. Since int() rounds to the floor, your batch_size = 0.
For example:
batch_size = 3
ngpus_per_node = 4
int(3/4) = 0

Try increasing your batch_size and maybe using multiples of your ngpus_per_node.

thank you for your advice , I solve it!

I am using single GPU but I got same error.

ValueError: batch_size should be a positive integer value, but got batch_size=0

Can any one help?