Process gets killed when training
50417 opened this issue · 1 comments
50417 commented
I am training with the smallest GPT2(117M parameters).
Loading dataset...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 109.93it/s]
dataset has 42736 tokens
Training...
Killed
However the process gets killed as shown above. Any help is appreciated.
50417 commented
Upon investigating, it was due to Out of memory error.
Resource exhausted: OOM when allocating tensor with shape[1024,50257] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Killed