TypeError when running `train.sh` on a new dataset

Question

TypeError when running `train.sh` on a new dataset

yrahul3910 opened this issue 4 years ago · 1 comments

I'm using the TensorFlow 2.1 version of the code, and I have trouble training the model on my own preprocessed dataset (for C++, using Kolkir's implementation). I have put the full train log below. Could you please help me solve this?

[ryedida@c4 code2seq]$ ./train.sh
2021-06-29 00:08:20.602096: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-29 00:08:24.251376: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-29 00:08:24.252532: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2021-06-29 00:08:25.455508: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-06-29 00:08:25.455583: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-29 00:08:25.460725: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-29 00:08:25.460816: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-29 00:08:25.462037: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-29 00:08:25.462340: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-29 00:08:25.466201: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-29 00:08:25.467140: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-29 00:08:25.467309: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-29 00:08:25.468240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
Created model
Num training samples: 11949
Dictionaries loaded.
Loaded subtoken vocab. size: 5091
Loaded target word vocab. size: 3626
Loaded nodes vocab. size: 87
2021-06-29 00:08:25.538455: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX512F
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-06-29 00:08:25.542117: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-06-29 00:08:25.542921: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:af:00.0 name: GeForce RTX 2060 computeCapability: 7.5
coreClock: 1.68GHz coreCount: 30 deviceMemorySize: 5.79GiB deviceMemoryBandwidth: 312.97GiB/s
2021-06-29 00:08:25.543073: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-29 00:08:25.543152: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-29 00:08:25.543209: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2021-06-29 00:08:25.543256: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcufft.so.10
2021-06-29 00:08:25.543301: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcurand.so.10
2021-06-29 00:08:25.543346: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusolver.so.10
2021-06-29 00:08:25.543390: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcusparse.so.11
2021-06-29 00:08:25.543434: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2021-06-29 00:08:25.544364: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1862] Adding visible gpu devices: 0
2021-06-29 00:08:25.545630: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2021-06-29 00:08:27.603706: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-06-29 00:08:27.603754: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-06-29 00:08:27.603763: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-06-29 00:08:27.606355: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1406] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5360 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2060, pci bus id: 0000:af:00.0, compute capability: 7.5)
Starting training
Training batch size:			 128
Dataset path:				 ../data/sequences/qemu/dataset/dataset
Training file path:			 ../data/sequences/qemu/dataset/dataset.train.c2s
Validation path:			 ../data/sequences/qemu/dataset/dataset.val.c2s
Taking max contexts from each example:	 100
Random path sampling:			 True
Embedding size:				 128
Using BiLSTMs, each of size:		 128
Decoder size:				 320
Decoder layers:				 1
Max path lengths:			 9
Max subtokens in a token:		 5
Max target length:			 6
Embeddings dropout keep_prob:		 0.75
LSTM dropout keep_prob:			 0.5
============================================
2021-06-29 00:08:27.779231: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
Number of trainable params: 1126912
Start training loop...
  0%|                                                                                                                               | 0/11949 [00:00<?, ?it/s]2021-06-29 00:08:29.936588: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:116] None of the MLIR optimization passes are enabled (registered 2)
2021-06-29 00:08:29.991827: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 2100000000 Hz
2021-06-29 00:08:38.120806: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2021-06-29 00:08:40.715900: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
Traceback (most recent call last):
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/module/module.py", line 351, in _flatten_module
    prop, expand_composites=expand_composites)
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 1425, in flatten_with_tuple_paths
    flatten(structure, expand_composites=expand_composites)))
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/util/nest.py", line 341, in flatten
    return _pywrap_utils.Flatten(structure, expand_composites)
TypeError: '<' not supported between instances of 'WhileBodyFuncGraph' and 'FuncGraph'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "code2seq.py", line 29, in <module>
    model.train()
  File "/home/ryedida/vulnerability/our-approach/code2seq/modelrunner.py", line 129, in train
    gradients = tape.gradient(loss, self.model.trainable_variables)
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/module/module.py", line 176, in trainable_variables
    self._flatten(predicate=_is_trainable_variable, expand_composites=True))
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/module/module.py", line 390, in _flatten_module
    for subvalue in subvalues:
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/module/module.py", line 390, in _flatten_module
    for subvalue in subvalues:
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/module/module.py", line 390, in _flatten_module
    for subvalue in subvalues:
  File "/home/ryedida/.local/lib/python3.6/site-packages/tensorflow/python/module/module.py", line 356, in _flatten_module
    cause)
  File "<string>", line 3, in raise_from
ValueError: Error processing property '_dropout_mask_cache' of <ContextValueCache at 0x7f99a00b3da0>
  0%|                                                                                                                               | 0/11949 [00:14<?, ?it/s]

Answer 1 · 2021-07-04T12:14:41.000Z

Hi @yrahul3910 ,
Thank you for your interest in code2seq and sorry for the late reply.

Unfortunately, I cannot provide support for Kolkir's implementation.
If you prefer to use our implementation (based on TF 1.x), I'll be glad to help.

Best,
Uri