Hang when calculate validation accuracy

Question

Hang when calculate validation accuracy

soonchangAI opened this issue 2 years ago · 0 comments

Hi, I ran a .sh script to calculate validation accuracy for few models.
The code hangs after calculating validation accuracy for a model ( the hang lasts for more than 30 minutes before). I have to use CTRL+C to break the hang, so the script continues calculate validation accuracy for the rest models (Hang occurs for each of the subsequent calculation too). How can I fix this ?

The print out on Terminal after CTRL + C :

Token indices sequence length is longer than the specified maximum sequence length for this model (599 > 512). Running this sequence through the model will result in indexing errors
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [02:37<00:00,  3.95s/it]
2022-06-06T00:21:54 INFO: Key current_iteration is not present in registry, returning default value of None
2022-06-06T00:21:54 INFO: m4c_textvqa: full val:, 0/4000, val/total_loss: 38.3987, val/m4c_textvqa/m4c_decoding_bce_with_mask: 38.3987, val/m4c_textvqa/textvqa_accuracy: 0.2572
^CTraceback (most recent call last):
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 246, in <module>
    main()
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/site-packages/torch/distributed/launch.py", line 239, in main
    process.wait()
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/home/cybertron/anaconda3/envs/tap/lib/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
KeyboardInterrupt