##Not found: Key bert/embeddings/LayerNorm/beta/AdamWeightDecayOptimizer not found in checkpoint
KelvinBull opened this issue · 10 comments
W tensorflow/core/framework/op_kernel.cc:1502] OP_REQUIRES failed at save_restore_v2_ops.cc:184 : Not found: Key bert/embeddings/LayerNorm/beta/AdamWeightDecayOptimizer not found in checkpoint
It just happened when I was modifying run_squad.py to multi-GPU. I am very upset what it was like. can you help me fix the error. many thanks.
The key not found in checkpoint
error means that the variable exists in your model in memory but not in the serialized checkpoint file on disk. So this may be a problem when you load checkpoints.
REF: https://stackoverflow.com/questions/45179556/key-variable-name-not-found-in-checkpoint-tensorflow
多谢了,每次在运行一次完整的程序时(session),只要建个新的输出文件夹就行了,估计是tf.estimator保存的只有trainable_variable,而不是global_variable。所以缺了moving average variables。#国庆节快乐
国庆节快乐;D
我按照你的方法修改squad,多个gpu都显示调用了,终端打印loss=3.6,step=0 之后就不能继续训练了:Nan loss durning training ,减小学习率也没用
我按照你的方法修改squad,多个gpu都显示调用了,终端打印loss=3.6,step=0 之后就不能继续训练了:Nan loss durning training ,减小学习率也没用
There is too little valid information, so I can't figure out where the problem is.
@KelvinBull Do you have any opinions or ideas?
@KelvinBull @haoyuhu 我遇到了同样的问题,可以说一下解决的方式吗?十分感谢!
@KelvinBull @haoyuhu 我遇到了同样的问题,可以说一下解决的方式吗?十分感谢!
具体问题不清楚,可以咨询 @KelvinBull
同问,谢谢!
同问,谢谢!
请提供一份完整可复现的代码和样例数据,我看看是什么问题。