Training Language model for some other language data-set

Question

Training Language model for some other language data-set

Opened this issue 5 years ago · 0 comments

How can we use your model to train language model for other language data set .

When we tried to run the python code for RL_train.py we got some error
we are using data with vocab size 169449
-----Initialized all dataset.-----
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
-----Constructing training graph.-----
-----Constructed training graph.-----
-----Constructing validating graph.-----
-----Constructed validating graph.-----
-----Constructing testing graph.-----
-----Constructed testing graph.-----
---Created and initialized fresh model. Size: 230862154
-----Start training model-----
W tensorflow/core/framework/op_kernel.cc:993] Invalid argument: indices[0,12] = 191814 is not in [0, 169449)
[[Node: Model/Embedding/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@Model/Embedding/word_embedding"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Model/Embedding/word_embedding/read, _recv_Model/input_0)]]
W tensorflow/core/kernels/queue_base.cc:294] _0_TrainInput/input_producer/fraction_of_32_full/fraction_of_32_full: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _1_TestInput/input_producer/fraction_of_32_full/fraction_of_32_full: Skipping cancelled enqueue attempt with queue not closed
W tensorflow/core/kernels/queue_base.cc:294] _2_ValidInput/input_producer/fraction_of_32_full/fraction_of_32_full: Skipping cancelled enqueue attempt with queue not closed
Traceback (most recent call last):
File "RL_train.py", line 321, in
tf.app.run()
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "RL_train.py", line 206, in main
train_model.initial_rnn_state2: rnn_state2
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 767, in run
run_metadata_ptr)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 965, in _run
feed_dict_string, options, run_metadata)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1015, in _do_run
target_list, options, run_metadata)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1035, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,12] = 191814 is not in [0, 169449)
[[Node: Model/Embedding/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@Model/Embedding/word_embedding"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Model/Embedding/word_embedding/read, _recv_Model/input_0)]]

Caused by op u'Model/Embedding/embedding_lookup', defined at:
File "RL_train.py", line 321, in
tf.app.run()
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "RL_train.py", line 100, in main
lamda = FLAGS.lamda
File "/home/pratikkumarpandey/Projects/DynamicLSTM-master/LanguageModeling/RL_model.py", line 70, in inference_graph
input_embedded = tf.nn.embedding_lookup(embedding, input_word)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/ops/embedding_ops.py", line 111, in embedding_lookup
validate_indices=validate_indices)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1359, in gather
validate_indices=validate_indices, name=name)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 763, in apply_op
op_def=op_def)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 2395, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/home/pratikkumarpandey/nlpenv/local/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1264, in init
self._traceback = _extract_stack()

InvalidArgumentError (see above for traceback): indices[0,12] = 191814 is not in [0, 169449)
[[Node: Model/Embedding/embedding_lookup = Gather[Tindices=DT_INT32, Tparams=DT_FLOAT, _class=["loc:@Model/Embedding/word_embedding"], validate_indices=true, _device="/job:localhost/replica:0/task:0/cpu:0"](Model/Embedding/word_embedding/read, _recv_Model/input_0)]]

could you please help us.