Hang when calculate validation accuracy
soonchangAI opened this issue · 0 comments
soonchangAI commented
Hi, I ran a .sh script to calculate validation accuracy for few models.
The code hangs after calculating validation accuracy for a model ( the hang lasts for more than 30 minutes before). I have to use CTRL+C to break the hang, so the script continues calculate validation accuracy for the rest models (Hang occurs for each of the subsequent calculation too). How can I fix this ?
The print out on Terminal after CTRL + C :
Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:37<00:00, 3.95s/it]
2022-06-06T00:21:54 INFO: Key current_iteration is not present in registry, returning default value of None
2022-06-06T00:21:54 INFO: m4c_textvqa: full val:, 0/4000, val/total_loss: 38.3987, val/m4c_textvqa/m4c_decoding_bce_with_mask: 38.3987, val/m4c_textvqa/textvqa_accuracy: 0.2572
^CTraceback (most recent call last):
File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
main()
File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 239, in main
process.wait()
File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1477, in wait
(pid, sts) = self._try_wait(0)
File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1424, in _try_wait
(pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt