steps_per_epoch根据训练集的不同需要修改吗?

Question

steps_per_epoch根据训练集的不同需要修改吗?

Opened this issue 4 years ago · 5 comments

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

Answer 1 · 2020-09-17T01:36:27.000Z

源码中使用的三个数据集总的样本数为396733，配置里step_per_peoch=500, gpus=4, batch_size=10，这样算每个epoch 的可训练的样本数=500 * 4 * 10 =20000，这样的话一个epoch是无法遍历整个数据集的，我这里也有困惑。

Answer 2 · 2020-12-01T14:55:37.000Z

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

你好，请问，这个最终的训练结果怎么样？我像试一试作者提供的pb模型，但不知道怎么从docker取文件，可以发我一份吗？我这里训练太慢了，一个epoch要30分钟，不知道为啥！

Answer 3 · 2020-12-15T09:28:36.000Z

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition_5435.pb，在_ = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33]
2.在train.py时报错
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init
assert_type(model, ModelDescBase, 'model')
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type
name, tp.name, v.class.name)
AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

Answer 4 · 2020-12-15T09:39:43.000Z

你好，我使用过程中有两个问题请教一下：

test.py过程中使用作者docker中的模型text_recognition_5435.pb，在_ = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33]
2.在train.py时报错
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init
assert_type(model, ModelDescBase, 'model')
File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type
name, tp.name, v.class.name)
AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found.

我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么?

1.我没用过作者的docker，我是直接按照这个需求配置的本地虚拟环境，也没用过作者的模型
2.应该是版本的问题？

Answer 5 · 2020-12-15T09:44:02.000Z

font{ line-height: 1.6; } ul,ol{ padding-left: 20px; list-style-position: inside; } 好的谢谢您 389261056 389261056@qq.com 签名由网易邮箱大师定制好的，谢谢您。下面这个问题看很多人都在问，我也遇见了，请问您是否有遇见。如果可以的话能加一下您的微信像您请教一下么（xianzhe741） Traceback (most recent call last): File "test.py", line 121, in <module> test(args) File "test.py", line 91, in test model = TextRecognition(args.pb_path, cfg.seq_len+1) File "test.py", line 23, in __init__ self.init_model() File "test.py", line 37, in init_model self.label_ph = self.sess.graph.get_tensor_by_name('label:0') File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3972, in get_tensor_by_name return self.as_graph_element(name, allow_tensor=True, allow_operation=False) File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3796, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/home/anaconda3/envs/p3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3838, in _as_graph_element_locked "graph." % (repr(name), repr(op_name))) KeyError: "The name 'label:0' refers to a Tensor which does not exist. The operation, 'label', does not exist in the graph.” 在2020年12月15日 17:40，wang pengyuan<notifications@github.com> 写道：你好，我使用过程中有两个问题请教一下： test.py过程中使用作者docker中的模型text_recognition_5435.pb，在_ = tf.import_graph_def(graph_def, name='')时报错 InvalidArgumentError (see above for traceback): The second input must be a scalar, but it has shape [1,33] 2.在train.py时报错 File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 119, in init assert_type(model, ModelDescBase, 'model') File "/usr/local/lib/python3.5/dist-packages/tensorpack/train/config.py", line 107, in assert_type name, tp.name, v.class.name) AssertionError: model has to be type 'ModelDescBase', but an object of type 'AttentionOCR' found. 我用art,lsvt和rects训练了180000个step,loss不怎么降低了,在1.2左右,测试效果和你提供的2个pb的模型差的有点多,你的大概85%左右,我的大概只有72%,可以提供下你pb对应的checkpoint么?我finetune下,或者有其他训练tricks么? 1.我没用过作者的docker，我是直接按照这个需求配置的本地虚拟环境，也没用过作者的模型 2.应该是版本的问题？ —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or unsubscribe.