bad argument #2 to '?' (end index out of bound) error
hanskrupakar opened this issue · 5 comments
I followed all the steps specified in your README.md
to try to implement a baseline RNN Attention Encoder Decoder model by Luong et al. 2015. I had no problems until I reached the actual training part.
When I give the train.lua
script, I get this error. How do I fix it to run the model as it should?
I have Nvidia GeForce 650M 2 GB GPU with 384 cores. I have cuda 7.5 and cudnn 4. Please help
th train.lua -data_file data/demo-train.hdf5 -val_data_file data/d
emo-val.hdf5 -savefile demo-model
using CUDA on GPU 1...
loading data...
done!
Source vocab size: 50004, Target vocab size: 50004
Source max sent len: 50, Target max sent len: 52
Number of additional features on source side: 0
Switching on memory preallocation
Number of parameters: 54338004 (active: 54338004)
/home/hans/torch/install/bin/luajit: bad argument #2 to '?' (end index out of bound)
stack traceback:
[C]: at 0x7f558238c530
[C]: in function '__index'
train.lua:394: in function 'train_batch'
train.lua:745: in function 'train'
train.lua:1071: in function 'main'
train.lua:1074: in main chunk
[C]: in function 'dofile'
...hans/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00405d50
can you try passing in -max_batch_l when you run train.lua?
The same error persists with the first line of the stack traceback alone different, when I have a -max_batch_l
of 32 :
stack traceback:
[C]: at 0x7faa6b4f2530
I should also mention, I have added a custom word2vec embedding in for both the encoder and the decoder side and the array index of those arrays are starting from 0 in the hdf5 file. But when I remove these files as input from the pre_word_vecs_enc
and pre_word_vecs_dec
, I am getting the same error with this alone different:
stack traceback:
[C]: at 0x7f0395f28530
Should I make changes here? This is the way I created the 2 hdf5 files:
vec = []
with open("demo.src.dict", 'r') as f:
for line in f.readlines():
t = np.array([modeleng[line.split(' ')[0].strip()]])
vec.append(t)
english=np.reshape(np.array(vec),(-1,embed_size))
# gives (50004,300) numpy array, embed_size = 300
# (I reduced it to run minimal example first)
vec=[]
with open("demo.targ.dict", 'r') as f:
for line in f.readlines():
t = np.array([modeltam[line.split(' ')[0].strip().decode('utf-8')]])
vec.append(t)
tamil = np.reshape(np.array(vec),(-1,embed_size))
# gives (50004,500) numpy array
if not os.path.isfile('src_wv_%d.hdf5'%(embed_size)):
with h5py.File('src_wv_%d.hdf5'%(embed_size), 'w') as hf:
hf.create_dataset('word_vecs', data=english)
if not os.path.isfile('targ_wv_%d.hdf5' %(embed_size)):
with h5py.File('targ_wv_%d.hdf5'%(embed_size), 'w') as hf:
hf.create_dataset('word_vecs', data=tamil)
hmm, did you use --batchsize 32 when running preprocess.py?
Okay I forgot to change it in preprocess.py
as well. I changed it there as well and it works. Silly mistake. Thanks so much. I will close the issue.
Can you explain the reason behind the error, why setting the batch size fixed it?
I also wanted to ask if there is any way that saving the model to file can be done in units smaller than 1 epoch (preferably in terms of the batches)? My running is slow, takes about an hour for an epoch and I want to save to file once for every 50 batches.
@hanskrupakar - we are woking on the intermediate saving - it will be based on a time period (every N minutes) and with the corresponding key-turn "restart" so that runs can be restarted easily from any checkpoint. We will be pushing the feature soon!