nshepperd/gpt-2

Process gets killed when training

50417 opened this issue · 1 comments

50417 commented

I am training with the smallest GPT2(117M parameters).

Loading dataset...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 109.93it/s]
dataset has 42736 tokens
Training...
Killed

However the process gets killed as shown above. Any help is appreciated.

50417 commented

Upon investigating, it was due to Out of memory error.

Resource exhausted: OOM when allocating tensor with shape[1024,50257] and type float on /job:localhost/replica:0/task:0/device:CPU:0 by allocator cpu
Killed