david8862/keras-YOLOv3-model-set

訓練問題

Opened this issue · 10 comments

請問您用TF1.x訓練是多少版本的?
我安裝tensorflow-gpu 1.15.0 版本,執行
python3 train.py --model_type=tiny_yolo3_darknet --model_input_shape=256x256 --anchors_path=configs/car-yolo3_anchors.txt --annotation_file=trainval.txt --classes_path=configs/car.txt --eval_online --save_eval_checkpoint --freeze_level=0

在終端機出現了下面的資訊

WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
Traceback (most recent call last):
File "train.py", line 11, in
from tensorflow_model_optimization.sparsity import keras as sparsity
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/init.py", line 86, in
from tensorflow_model_optimization.python.core.api import clustering
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/api/init.py", line 19, in
from tensorflow_model_optimization.python.core.api import sparsity
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/api/sparsity/init.py", line 16, in
from tensorflow_model_optimization.python.core.api.sparsity import keras
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/api/sparsity/keras/init.py", line 18, in
from tensorflow_model_optimization.python.core.sparsity.keras.prune import prune_low_magnitude
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/prune.py", line 22, in
from tensorflow_model_optimization.python.core.sparsity.keras import pruning_wrapper
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/pruning_wrapper.py", line 33, in
from tensorflow_model_optimization.python.core.sparsity.keras import prune_registry
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/prune_registry.py", line 26, in
class PruneRegistry(object):
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/prune_registry.py", line 96, in PruneRegistry
layers.experimental.preprocessing.Rescaling.class: [],
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in getattr
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow.python.keras.api._v1.keras.layers' has no attribute 'experimental'

請問這個問題要怎麼解決?
謝謝

看起来似乎是tensorflow_model_optimization lib的兼容性问题。你可以试试注释掉出错的行:

File "train.py", line 11, in
from tensorflow_model_optimization.sparsity import keras as sparsity

您好我嘗試註解掉
File "train.py", line 11, in
from tensorflow_model_optimization.sparsity import keras as sparsity

會出現下面的錯誤資訊
File "train.py", line 13, in
from yolo5.model import get_yolo5_train_model
File "/home/hank/keras-YOLOv3-model-set-master/yolo5/model.py", line 22, in
from common.model_utils import add_metrics, get_pruning_model
File "/home/hank/keras-YOLOv3-model-set-master/common/model_utils.py", line 7, in
from tensorflow_model_optimization.sparsity import keras as sparsity
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/init.py", line 86, in
from tensorflow_model_optimization.python.core.api import clustering
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/api/init.py", line 19, in
from tensorflow_model_optimization.python.core.api import sparsity
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/api/sparsity/init.py", line 16, in
from tensorflow_model_optimization.python.core.api.sparsity import keras
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/api/sparsity/keras/init.py", line 18, in
from tensorflow_model_optimization.python.core.sparsity.keras.prune import prune_low_magnitude
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/prune.py", line 22, in
from tensorflow_model_optimization.python.core.sparsity.keras import pruning_wrapper
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/pruning_wrapper.py", line 33, in
from tensorflow_model_optimization.python.core.sparsity.keras import prune_registry
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/prune_registry.py", line 26, in
class PruneRegistry(object):
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_model_optimization/python/core/sparsity/keras/prune_registry.py", line 96, in PruneRegistry
layers.experimental.preprocessing.Rescaling.class: [],
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in getattr
attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow.python.keras.api._v1.keras.layers' has no attribute 'experimental'

我後面陸續嘗試註解掉錯誤的行,會造成更多的錯誤

@Hank880223 看起来似乎是tensorflow-model-optimization的版本更新引起的问题。你可以尝试安装0.5.0版本,应该是可以工作的:

pip install tensorflow-model-optimization==0.5.0

感謝您的幫助,確實是tensorflow-model-optimization的版本更新引起的问题,更換成tensorflow-model-optimization==0.5.0,就可以正常運作了

請問TF1在訓練時的速度比TF2慢是正常的嗎?
GPU在記憶體的使用上相差甚多

TF1
Screenshot from 2022-01-11 20-39-59

TF2
Screenshot from 2022-01-11 20-41-11

@Hank880223 看起来TF1环境中并没有真正使用GPU进行训练。可能需要检查一下CUDA/CuDNN与TF的兼容性

@david8862 我將我的CUDA/CuDNN版本更換成
cudatoolkit-10.0.
cudnn-7.6.5
GPU就能正常的運作了,感謝您的協助

我在訓練10個epoch後,出現了這個問題

run
python3 train.py --model_type=tiny_yolo3_shufflenetv2 --model_input_shape=320x320 --anchors_path=configs/bdd100k_anchor6.txt --annotation_file=configs/bdd100k_trainval.txt --classes_path=configs/car.txt --eval_online --save_eval_checkpoint --freeze_level=0 --weights_path=model.h5

錯誤資訊
WARNING:root:Limited tf.compat.v2.summary API due to missing TensorBoard installation.
WARNING:tensorflow:From /home/hank/keras-YOLOv3-model-set-master/common/utils.py:32: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/hank/keras-YOLOv3-model-set-master/common/utils.py:32: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

WARNING:tensorflow:From /home/hank/keras-YOLOv3-model-set-master/common/utils.py:35: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/hank/keras-YOLOv3-model-set-master/common/utils.py:35: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

WARNING:tensorflow:From /home/hank/keras-YOLOv3-model-set-master/common/utils.py:38: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

WARNING:tensorflow:From /home/hank/keras-YOLOv3-model-set-master/common/utils.py:38: The name tf.keras.backend.set_session is deprecated. Please use tf.compat.v1.keras.backend.set_session instead.

WARNING:tensorflow:period argument is deprecated. Please use save_freq to specify the frequency in number of samples seen.
WARNING:tensorflow:period argument is deprecated. Please use save_freq to specify the frequency in number of samples seen.
WARNING:tensorflow:From /home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
backbone layers number: 205
backbone layers number: 205
Create Tiny tiny_yolo3_shufflenetv2 model with 6 anchors and 3 classes.
model layer number: 221
Traceback (most recent call last):
File "train.py", line 340, in
main(args)
File "train.py", line 181, in main
model = get_train_model(args.model_type, anchors, num_classes, weights_path=args.weights_path, freeze_level=freeze_level, optimizer=optimizer, label_smoothing=args.label_smoothing, elim_grid_sense=args.elim_grid_sense, model_pruning=args.model_pruning, pruning_end_step=pruning_end_step)
File "/home/hank/keras-YOLOv3-model-set-master/yolo3/model.py", line 275, in get_yolo3_train_model
model_body.load_weights(weights_path, by_name=True)#, skip_mismatch=True)
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/training.py", line 182, in load_weights
return super(Model, self).load_weights(filepath, by_name)
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/keras/engine/network.py", line 1371, in load_weights
saving.load_weights_from_hdf5_group_by_name(f, self.layers)
File "/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 712, in load_weights_from_hdf5_group_by_name
original_keras_version = f.attrs['keras_version'].decode('utf8')
AttributeError: 'str' object has no attribute 'decode'

@Hank880223 这个似乎是TF1中keras的某种兼容性问题。我的做法是直接修改出错处的python code:

"/home/hank/anaconda3/envs/keras-tf1/lib/python3.6/site-packages/tensorflow_core/python/keras/saving/hdf5_format.py", line 712, in load_weights_from_hdf5_group_by_name

original_keras_version = f.attrs['keras_version'].decode('utf8')
to
original_keras_version = f.attrs['keras_version']

@david8862 感謝您的作法,我將.decode('utf8')刪除後,在TF1中就能夠成功正常的訓練了