Error when calculating the validation loss - indices[0,1200] = 1200 is not in [0, 1024)
yijunzhouzoey opened this issue · 1 comments
yijunzhouzoey commented
Hi! Thanks for your excellent repo and instructions. The model was trained well but I got an issue when I was trying to calculate validation loss on my own validation data when I choose the val_batch_count=4000.
The Error is as follows:
Loading checkpoint models/medium/model-570000
Loading dataset...
Training...
Calculating validation loss...
37%|██████████████▎ | 1471/4000 [13:49<20:24, 2.07it/s]Traceback (most recent call last):
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,1200] = 1200 is not in [0, 1024)
[[{{node model_1/GatherV2_1}} = GatherV2[Taxis=DT_INT32, Tindices=DT_INT32, Tparams=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"](model/wpe/read, model_1/Tile, model_1/h23/attn/range/start)]]
I found a similar issue here: minimaxir/gpt-2-simple#38
It is said this may caused by long prefix but not sure how to solve that.
Anyone may know how?
yijunzhouzoey commented
Hi!
This issue should be solved by pulling the image refer to the docker file in OpenAI GPT-2 repo.
Thanks!