wind91725/gpt2-ml-finetune-

【讨论】gpt2-ml,30G,22w步模型微调报错解决方案

NLPIG opened this issue · 1 comments

NLPIG commented

tensorflow2.x一直报错,因为 'contrib'在2.x中已经删除,降级成1.x(1.14、1.15)能运行,
开始训练后会出现一堆warring:

【Start trainning.............................................
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0616 06:45:01.207822 140262356580224 deprecation.py:506] From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.init (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
WARNING:tensorflow:From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
W0616 06:45:01.208348 140262356580224 deprecation.py:323] From /usr/local/lib/python3.7/dist-packages/tensorflow_core/python/training/training_util.py:236: Variable.initialized_value (from tensorflow.python.ops.variables) is deprecated and will be removed in a future version.
Instructions for updating:
Use Variable.read_value. Variables in 2.X are initialized automatically both in eager and graph (inside tf.defun) contexts.
WARNING:tensorflow:From /content/drive/MyDrive/gpt2-ml/train/dataloader.py:63: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic.
W0616 06:45:01.223439 140262356580224 deprecation.py:323] From /content/drive/MyDrive/gpt2-ml/train/dataloader.py:63: parallel_interleave (from tensorflow.python.data.experimental.ops.interleave_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.data.Dataset.interleave(map_func, cycle_length, block_length, num_parallel_calls=tf.data.experimental.AUTOTUNE) instead. If sloppy execution is desired, use tf.data.Options.experimental_determinstic.
WARNING:tensorflow:From /content/drive/MyDrive/gpt2-ml/train/dataloader.py:81: map_and_batch (from tensorflow.python.data.experimental.ops.batching) is deprecated and will be removed in a future version.

开始循环训练之后会出现致命错误:

【ERROR:tensorflow:Error recorded from training_loop: module 'tensorflow._api.v1.compat.v1' has no attribute 'contrib'
E0616 06:45:48.644567 140262356580224 error_handling.py:75] Error recorded from training_loop: module 'tensorflow._api.v1.compat.v1' has no attribute 'contrib'
INFO:tensorflow:training_loop marked as finished】

Google了一圈没有找到解决办法,我猜最大的问题出现在 'tensorflow._api.v1.compat.v1' has no attribute 'contrib'上,估计修改API就好?但是找不到这段代码在哪里。
小白一个,还请大佬指点迷津。

NLPIG commented

环境是colab pro,已经执行!pip uninstall -y tensorflow和install tensorflow==1.15.2,python是3.7