python train.py Error during training？why？

Question

python train.py Error during training？why？

jiangxinufo opened this issue 4 years ago · 1 comments

(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>python train.py --dataset ./data/voc2007_train_stone.tfrecord
--val_dataset ./data/voc2007_val_stone.tfrecord --classes ./data/stone.names --num_classes 1 --mode fit --transfer
darknet --batch_size 4 --epochs 20 --weights ./checkpoints/yolov3.tf --weights_num_classes 80
2021-04-24 13:50:14.338255: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions tha
t this TensorFlow binary was not compiled to use: AVX
Epoch 1/20
2021-04-24 13:50:57.964647: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecuto
r::StartAbort Invalid argument: Paddings must be non-negative: 0 -12
[[{{node Pad}}]]
[[IteratorGetNext]]
2021-04-24 13:50:57.975653: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
1/Unknown - 13s 13s/stepWARNING:tensorflow:Reduce LR on plateau conditioned on metric val_loss which is not
available. Available metrics are: lr
W0424 13:50:57.973832 2388 callbacks.py:1934] Reduce LR on plateau conditioned on metric val_loss which is not a
vailable. Available metrics are: lr
WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are:
W0424 13:50:57.973832 2388 callbacks.py:1286] Early stopping conditioned on metric val_loss which is not availab
le. Available metrics are:

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
1/Unknown - 17s 17s/stepTraceback (most recent call last):
File "train.py", line 195, in
app.run(main)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 190, in main
validation_data=val_dataset)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py
", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2
.py", line 342, in fit
total_epochs=epochs)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2
.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2
_utils.py", line 98, in execution_function
distributed_function(input_fn))
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py",
line 568, in call
result = self._call(*args, **kwds)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py",
line 632, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line
2363, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line
1611, in _filtered_call
self.captured_inputs)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line
1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line
545, in call
ctx=ctx)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line
67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -12
[[{{node Pad}}]]
[[IteratorGetNext]] [Op:__inference_distributed_function_47459]

Function call stack:
distributed_function

WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8
W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-8
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9
W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-9
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10
W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-10
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11
W0424 13:51:03.144867 2388 util.py:144] Unresolved object in checkpoint: (root).layer-11
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but
not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status objec
t, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to m
ake the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0424 13:51:03.144867 2388 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Mo
del.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on
the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use
assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for
details.

(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>
(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>
(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>python train.py --dataset ./dat
a/voc2007_train_stone.tfrecord --val_dataset ./data/voc2007_val_stone.tfrecord --classes ./data/stone.names --num_classes 1 --mode fit --transfer dark
net --batch_size 4 --epochs 20 --weights ./checkpoints/yolov3.tf --weights_num_classes 80
2021-04-24 14:16:12.964344: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not co
mpiled to use: AVX
Traceback (most recent call last):
File "train.py", line 197, in
app.run(main)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 251, in run_main
sys.exit(main(argv))
File "train.py", line 189, in main
step_per_epoch=x.shape[0]//Batchsize,
NameError: name 'x' is not defined
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8
W0424 14:16:31.858063 5304 util.py:144] Unresolved object in checkpoint: (root).layer-8
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9
W0424 14:16:31.859061 5304 util.py:144] Unresolved object in checkpoint: (root).layer-9
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10
W0424 14:16:31.859061 5304 util.py:144] Unresolved object in checkpoint: (root).layer-10
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11
W0424 14:16:31.859061 5304 util.py:144] Unresolved object in checkpoint: (root).layer-11
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were us
ed. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence
these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0424 14:16:31.860061 5304 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all check
pointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect
partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mec
hanics for details.

(yolov3-tf2-cpu) (venv) M:\MachineLearning\yolov3-tf2>python train.py --dataset ./data/voc2007_train_stone.tfrecord --val_dataset ./data/voc2007_val_s
tone.tfrecord --classes ./data/stone.names --num_classes 1 --mode fit --transfer darknet --batch_size 4 --epochs 20 --weights ./checkpoints/yolov3.tf
--weights_num_classes 80
2021-04-24 14:18:51.426349: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not co
mpiled to use: AVX
Epoch 1/20
2021-04-24 14:19:34.278458: W tensorflow/core/common_runtime/base_collective_executor.cc:217] BaseCollectiveExecutor::StartAbort Invalid argument: Pad
dings must be non-negative: 0 -12
[[{{node Pad}}]]
[[IteratorGetNext]]
2021-04-24 14:19:34.290370: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started.
1/Unknown - 13s 13s/stepWARNING:tensorflow:Reduce LR on plateau conditioned on metric val_loss which is not available. Available metrics are:
lr
W0424 14:19:34.282967 1632 callbacks.py:1934] Reduce LR on plateau conditioned on metric val_loss which is not available. Available metrics are: lr

WARNING:tensorflow:Early stopping conditioned on metric val_loss which is not available. Available metrics are:
W0424 14:19:34.282967 1632 callbacks.py:1286] Early stopping conditioned on metric val_loss which is not available. Available metrics are:

Epoch 00001: saving model to checkpoints/yolov3_train_1.tf
1/Unknown - 17s 17s/stepTraceback (most recent call last):
File "train.py", line 196, in
app.run(main)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "train.py", line 191, in main
validation_data=val_dataset)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 819, in fit
use_multiprocessing=use_multiprocessing)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 342, in fit
total_epochs=epochs)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2.py", line 128, in run_one_epoch
batch_outs = execution_function(iterator)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\keras\engine\training_v2_utils.py", line 98, in execution_f
unction
distributed_function(input_fn))
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 568, in call
result = self._call(*args, **kwds)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\def_function.py", line 632, in _call
return self._stateless_fn(*args, **kwds)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 2363, in call
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1611, in _filtered_call
self.captured_inputs)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 1692, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\function.py", line 545, in call
ctx=ctx)
File "C:\Users\jx-dl.conda\envs\yolov3-tf2-cpu\lib\site-packages\tensorflow_core\python\eager\execute.py", line 67, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Paddings must be non-negative: 0 -12
[[{{node Pad}}]]
[[IteratorGetNext]] [Op:__inference_distributed_function_47459]

Function call stack:
distributed_function

WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-8
W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-8
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-9
W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-9
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-10
W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-10
WARNING:tensorflow:Unresolved object in checkpoint: (root).layer-11
W0424 14:19:38.904387 1632 util.py:144] Unresolved object in checkpoint: (root).layer-11
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were us
ed. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence
these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details.
W0424 14:19:38.904387 1632 util.py:152] A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all check
pointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_
partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mec
hanics for details.

Answer 1 · 2022-04-18T03:40:59.000Z

我也是这个问题，请问你是怎么解决的