batchSize > 1时,INVALID_ARGUMENT: required broadcastable shapes
Closed this issue · 2 comments
BobH233 commented
batchsize测试下来只能设置1,不管是什么gpu,一旦设置大于1就会报错如下
-----------------
模式1:搭建模型
模式2:训练模型
模式3:开启Coolq接口
模式4:进行对话
-----------------
输入工作模式:2
输入循环轮数:2000
输入batch size:2
Tue May 10 09:35:46 2022 正在处理训练数据...
Tue May 10 09:35:46 2022 循环轮数:2000 batch size:2
2022-05-10 09:35:46.823820: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:46.833743: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:46.834556: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:46.835683: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-05-10 09:35:46.836013: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:46.836818: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:46.837657: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:47.446699: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:47.447592: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:47.448356: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:936] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-05-10 09:35:47.449143: W tensorflow/core/common_runtime/gpu/gpu_bfc_allocator.cc:39] Overriding allow_growth setting because the TF_FORCE_GPU_ALLOW_GROWTH environment variable is set. Original config value was 0.
2022-05-10 09:35:47.449235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 10794 MB memory: -> device: 0, name: Tesla K80, pci bus id: 0000:00:04.0, compute capability: 3.7
Epoch 1/2000
2022-05-10 09:36:03.719284: I tensorflow/stream_executor/cuda/cuda_dnn.cc:368] Loaded cuDNN version 8100
2022-05-10 09:36:04.314831: W tensorflow/core/framework/op_kernel.cc:1733] INVALID_ARGUMENT: required broadcastable shapes
Traceback (most recent call last):
File "/content/drive/MyDrive/Ossas_ChatBot/main.py", line 113, in <module>
rnn.train_model(bat, epo)
File "/content/drive/MyDrive/Ossas_ChatBot/Seq2Seq.py", line 79, in train_model
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
File "/usr/local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 67, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.9/site-packages/tensorflow/python/eager/execute.py", line 54, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: Graph execution error:
Detected at node 'Equal' defined at (most recent call last):
File "/content/drive/MyDrive/Ossas_ChatBot/main.py", line 113, in <module>
rnn.train_model(bat, epo)
File "/content/drive/MyDrive/Ossas_ChatBot/Seq2Seq.py", line 79, in train_model
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
File "/usr/local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1384, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1021, in train_function
return step_function(self, iterator)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1010, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
outputs = model.train_step(data)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 864, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 957, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "/usr/local/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 459, in update_state
metric_obj.update_state(y_t, y_p, sample_weight=mask)
File "/usr/local/lib/python3.9/site-packages/keras/utils/metrics_utils.py", line 70, in decorated
update_op = update_state_fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/metrics.py", line 178, in update_state_fn
return ag_update_state(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/metrics.py", line 729, in update_state
matches = ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/metrics.py", line 4043, in categorical_accuracy
tf.equal(
Node: 'Equal'
Detected at node 'Equal' defined at (most recent call last):
File "/content/drive/MyDrive/Ossas_ChatBot/main.py", line 113, in <module>
rnn.train_model(bat, epo)
File "/content/drive/MyDrive/Ossas_ChatBot/Seq2Seq.py", line 79, in train_model
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
File "/usr/local/lib/python3.9/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
return fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1384, in fit
tmp_logs = self.train_function(iterator)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1021, in train_function
return step_function(self, iterator)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1010, in step_function
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 1000, in run_step
outputs = model.train_step(data)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 864, in train_step
return self.compute_metrics(x, y, y_pred, sample_weight)
File "/usr/local/lib/python3.9/site-packages/keras/engine/training.py", line 957, in compute_metrics
self.compiled_metrics.update_state(y, y_pred, sample_weight)
File "/usr/local/lib/python3.9/site-packages/keras/engine/compile_utils.py", line 459, in update_state
metric_obj.update_state(y_t, y_p, sample_weight=mask)
File "/usr/local/lib/python3.9/site-packages/keras/utils/metrics_utils.py", line 70, in decorated
update_op = update_state_fn(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/metrics.py", line 178, in update_state_fn
return ag_update_state(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/metrics.py", line 729, in update_state
matches = ag_fn(y_true, y_pred, **self._fn_kwargs)
File "/usr/local/lib/python3.9/site-packages/keras/metrics.py", line 4043, in categorical_accuracy
tf.equal(
Node: 'Equal'
2 root error(s) found.
(0) INVALID_ARGUMENT: required broadcastable shapes
[[{{node Equal}}]]
[[broadcast_weights_1/assert_broadcastable/AssertGuard/pivot_f/_15/_71]]
(1) INVALID_ARGUMENT: required broadcastable shapes
[[{{node Equal}}]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_21896]
BobH233 commented
python版本:Python 3.9.12
cudnn版本:8.1.0.77
环境:Google Colab + GPU
BobH233 commented
已解决,需要python版本3.7,tensorflow版本 2.1.0