not working anymore on Tensorflow 1.0 version

Question

not working anymore on Tensorflow 1.0 version

bingoko opened this issue 8 years ago · 2 comments

The first error is about tf.nn has no attribute rnn_cell for models/dual_encoder.py line 45.
I fixed this error by changing from tf.nn.rnn_cell to tf.contrib.rnn.core_rnn_cell.

Then the second error is about TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Some parameter types are not matched anymore.
Can anyone fix this?

I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcublas.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcudnn.so.5. LD_LIBRARY_PATH: /usr/local/cuda-8.0/lib64:
I tensorflow/stream_executor/cuda/cuda_dnn.cc:3517] Unable to load cuDNN DSO
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcufft.so.8.0 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:135] successfully opened CUDA library libcurand.so.8.0 locally
INFO:tensorflow:Using config: {'_tf_random_seed': None, '_task_id': 0, '_save_summary_steps': 100, '_keep_checkpoint_max': 5, '_save_checkpoints_secs': 600, '_master': '', '_environment': 'local', '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x7f0aa666a358>, '_evaluation_master': '', '_task_type': None, '_num_ps_replicas': 0, '_keep_checkpoint_every_n_hours': 10000, '_tf_config': gpu_options {
per_process_gpu_memory_fraction: 1
}
, '_save_checkpoints_steps': None, '_is_chief': True}
WARNING:tensorflow:From /home/ucl/.local/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/monitors.py:267: BaseMonitor.init (from tensorflow.contrib.learn.python.learn.monitors) is deprecated and will be removed after 2016-12-05.
Instructions for updating:
Monitors are deprecated. Please use tf.train.SessionRunHook.
INFO:tensorflow:No glove/vocab path specificed, starting with random embeddings.
Traceback (most recent call last):
File "udc_train.py", line 64, in
tf.app.run()
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 44, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "udc_train.py", line 61, in main
estimator.fit(input_fn=input_fn_train, steps=None, monitors=[eval_monitor])
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 280, in new_func
return func(*args, **kwargs)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit
loss = self._train_model(input_fn=input_fn, hooks=hooks)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 934, in _train_model
model_fn_ops = self._call_legacy_get_train_ops(features, labels)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1003, in _call_legacy_get_train_ops
train_ops = self._get_train_ops(features, labels)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1162, in _get_train_ops
return self._call_model_fn(features, labels, model_fn_lib.ModeKeys.TRAIN)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 1133, in _call_model_fn
model_fn_results = self._model_fn(features, labels, **kwargs)
File "/home/ucl/chatbot-retrieval/udc_model.py", line 39, in model_fn
targets)
File "/home/ucl/chatbot-retrieval/models/dual_encoder.py", line 54, in dual_encoder_model
tf.concat(0, [context_embedded, utterance_embedded]),
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/ops/array_ops.py", line 1029, in concat
dtype=dtypes.int32).get_shape(
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 637, in convert_to_tensor
as_ref=False)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 702, in internal_convert_to_tensor
ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 110, in _constant_tensor_conversion_function
return constant(v, dtype=dtype, name=name)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/framework/constant_op.py", line 99, in constant
tensor_util.make_tensor_proto(value, dtype=dtype, shape=shape, verify_shape=verify_shape))
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 367, in make_tensor_proto
_AssertCompatible(values, dtype)
File "/home/ucl/.local/lib/python3.5/site-packages/tensorflow/python/framework/tensor_util.py", line 302, in _AssertCompatible
(dtype.name, repr(mismatch), type(mismatch).name))
TypeError: Expected int32, got list containing Tensors of type '_Message' instead.

Answer 1 · 2017-04-03T07:06:12.000Z

You need to fix all instances of tf.concat and tf.split as follows.
For tf.concat, Eg. in dual_encody.py, line 54, change:
tf.concat(0, [context_embedded, utterance_embedded]),
to this:
tf.concat([context_embedded, utterance_embedded], 0),
(ie switch order of arguments)

Same thing for all the tf.split cases, Eg. line 57 in same file, change:
encoding_context, encoding_utterance = tf.split(0, 2, rnn_states.h)
to this:
encoding_context, encoding_utterance = tf.split(rnn_states.h, 2, 0)
by switching the first and last arguments.

There were a few more things I had to change to get training running, too.

Answer 2 · 2017-04-21T15:24:45.000Z

Thanks, I have updated all of these, as well as
tf.histogram_summary -> tf.summary.histogram
and
tf.scalar_summary -> tf.summary.scalar

However, there is a new error:

I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla M60, pci bus id: 88f8:00:00.0)
W tensorflow/core/framework/op_kernel.cc:993] Out of range: Reached limit of 1
[[Node: read_batch_features_eval/file_name_queue/limit_epochs/CountUpTo = CountUpToT=DT_INT64, _class=["loc:@read_batch_features_eval/file_name_queue/limit_epochs/epochs"], limit=1, _device="/job:localhost/replica:0/task:0/cpu:0"]]

Any idea?