Sample script(train_qm9.py) raises ZeroDivisionError with GPU mode
jo7ueb opened this issue · 2 comments
jo7ueb commented
Hi,
I got ZeroDivisionError
with train_qm.py using GPU#0.
Exception in main training loop: float divmod()
Traceback (most recent call last):
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
while not stop_trigger(self):
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
self.count = epoch_detail // self.period
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_qm9.py", line 303, in <module>
main()
File "train_qm9.py", line 287, in main
trainer.run()
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 329, in run
six.reraise(*sys.exc_info())
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
while not stop_trigger(self):
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
self.count = epoch_detail // self.period
ZeroDivisionError: float divmod()
Here's full message.
I found gpu=-1
even though GPU is selected.
+ cd ..
+ echo --- Testing QM9 ---
+ cd qm9
+ bash -x test_qm9.sh 0
+ set -e
+ methods=(nfp ggnn schnet weavenet rsgcn)
+ epoch=0
+ gpu=-1
+ for method in '${methods[@]}'
+ '[' -d input ']'
+ rm -rf input
+ python train_qm9.py --method nfp --label A --conv-layers 1 --gpu -1 --epoch 0 --unit-num 10 --num-data 100
/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/sklearn/externals/joblib/externals/cloudpickle/cloudpickle.py:47: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
100%|??????????| 100/100 [00:00<00:00, 1889.18it/s]
Exception in main training loop: float divmod()
Traceback (most recent call last):
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
while not stop_trigger(self):
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
self.count = epoch_detail // self.period
Will finalize trainer extensions and updater before reraising the exception.
Traceback (most recent call last):
File "train_qm9.py", line 303, in <module>
main()
File "train_qm9.py", line 287, in main
trainer.run()
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 329, in run
six.reraise(*sys.exc_info())
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/trainer.py", line 312, in run
while not stop_trigger(self):
File "/home/s1830001/works/chainer-chemistry/lib/python3.7/site-packages/chainer/training/triggers/interval_trigger.py", line 60, in __call__
self.count = epoch_detail // self.period
ZeroDivisionError: float divmod()
How can I fix this ZeroDivisionError
?
Thanks.
corochann commented
Thank you for PR! we will check it.
TakashiKusachi commented
I also experienced the same problem, but thanks to you.