chainer/chainer-chemistry

Sample script(train_qm9.py) raises ZeroDivisionError with GPU mode

jo7ueb opened this issue · 2 comments

Hi,

I got ZeroDivisionError with train_qm.py using GPU#0.

Exception in main training loop: float divmod()
Traceback (most recent call last):
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
    while not stop_trigger(self):
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
    self.count = epoch_detail // self.period
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "train_qm9.py", line 303, in <module>
    main()
  File "train_qm9.py", line 287, in main
    trainer.run()
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 329, in run
    six.reraise(*sys.exc_info())
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
    while not stop_trigger(self):
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
    self.count = epoch_detail // self.period
ZeroDivisionError: float divmod()

Here's full message.
I found gpu=-1 even though GPU is selected.

+ cd ..
+ echo --- Testing QM9 ---
+ cd qm9
+ bash -x test_qm9.sh 0
+ set -e
+ methods=(nfp ggnn schnet weavenet rsgcn)
+ epoch=0
+ gpu=-1
+ for method in '${methods[@]}'
+ '[' -d input ']'
+ rm -rf input
+ python train_qm9.py --method nfp --label A --conv-layers 1 --gpu -1 --epoch 0 --unit-num 10 --num-data 100
/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
  import imp
100%|??????????| 100/100 [00:00<00:00, 1889.18it/s]
Exception in main training loop: float divmod()
Traceback (most recent call last):
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
    while not stop_trigger(self):
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
    self.count = epoch_detail // self.period
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
  File "train_qm9.py", line 303, in <module>
    main()
  File "train_qm9.py", line 287, in main
    trainer.run()
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 329, in run
    six.reraise(*sys.exc_info())
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/six.py", line 693, in reraise
    raise value
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
    while not stop_trigger(self):
  File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
    self.count = epoch_detail // self.period
ZeroDivisionError: float divmod()

How can I fix this ZeroDivisionError ?

Thanks.

Thank you for PR! we will check it.

I also experienced the same problem, but thanks to you.