nshepperd/gpt-2

Error when calculating the validation loss - indices[0,1200] = 1200 is not in [0, 1024)

yijunzhouzoey opened this issue · 1 comments

Hi! Thanks for your excellent repo and instructions. The model was trained well but I got an issue when I was trying to calculate validation loss on my own validation data when I choose the val_batch_count=4000.

The Error is as follows:

Loading checkpoint models/medium/model-570000
Loading dataset...
Training...
Calculating validation loss...
 37%|██████████████▎                        | 1471/4000 [13:49<20:24,  2.07it/s]Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,1200] = 1200 is not in [0, 1024)
	 [[{{node model_1/GatherV2_1}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/wpe/read, model_1/Tile, model_1/h23/attn/range/start)]]

I found a similar issue here: minimaxir/gpt-2-simple#38
It is said this may caused by long prefix but not sure how to solve that.

Anyone may know how?

Hi!

This issue should be solved by pulling the image refer to the docker file in OpenAI GPT-2 repo.

Thanks!