rockingdingo/deepnlp

NotFoundError (see above for traceback): Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint

Opened this issue · 14 comments

tagger = ner_tagger.load_model(lang = 'zh')
使用命名实体识别模块进行模型加载的时候报错:
NotFoundError (see above for traceback): Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint
说是没有这个bias偏置参数
在加载tensorflow的时候提示:
Not found: Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint
Not found: Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/kernel not found in checkpoint
在anaconda3\lib\site-packages\deepnlp\ner/ckpt\zh\ner.ckpt中没有ner.ckpt文件
checkpoint文件中的路径路径不存在:
model_checkpoint_path: "/mnt/python/pypi/deepnlp/deepnlp/ner/ckpt/zh/ner.ckpt"
all_model_checkpoint_paths: "/mnt/python/pypi/deepnlp/deepnlp/ner/ckpt/zh/ner.ckpt"
请问一下这个需要加载嘛,或者文件在哪里

哥们,ner和pos的模型中的checkpoink文件在哪里?有啥作用啊@rockingdingo

@michaelwangtd 刚刚正好也在准备尝试ner,😂,检查了checkpoint里面发现同样的问题,和你的报错也一样..

@dyllanwli@rockingdingo可以确定的是作者加载的路径文件ner.ckpt没有,不知道这个文件的作用是啥。所以估计换了tensorflow的版本到1.0.0,也解决不了问题。求高手解答!!

@michaelwangtd @dyllanwli Hi, 因为python包的大小限制, ner.ckpt 预训练好的模型文件没有在pypi发布,所以如果是通过pip安装的,相应的目录下是空的。需要下载模型才能用,试试这个命令 deepnlp.download('ner'),当初就是为了从github把模型下载到本地对应的安装目录的。如果还不行试试手动下下来放进去?checkpoint 文件是为了tensorflow saver.restore() 对应的模型用的。
模型就在github里面:
https://github.com/rockingdingo/deepnlp/tree/master/deepnlp/ner/ckpt/zh

@rockingdingo 您说的很对,模型文件保存在deepnlp/ner/ckpt/zh/里面,并且您之前训练好的模型应该就是ner.ckpt.data-00000-of-00001文件。但是使用tf.train.Saver()加载模型之后报错说是
Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias not found in checkpoint
参数找不到(这里是打印出来的所有参数信息:[<tf.Variable 'ner_var_scope/embedding:0' shape=(60000, 128) dtype=float32_ref>, <tf.Variable 'ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/kernel:0' shape=(256, 512) dtype=float32_ref>, <tf.Variable 'ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/bias:0' shape=(512,) dtype=float32_ref>, <tf.Variable 'ner_var_scope/softmax_w:0' shape=(128, 8) dtype=float32_ref>, <tf.Variable 'ner_var_scope/softmax_b:0' shape=(8,) dtype=float32_ref>, <tf.Variable 'ner_var_scope/Variable:0' shape=() dtype=float32_ref>]

依您看是不是模型有问题或者是tf版本的问题,因为我更换tf1.2.1到1.0.0之后似乎还是有问题,似乎说是1.0.0有一个bug。感谢您的指点!

请问有没有解决这个问题呢?不想造轮子了,如果能用,万分感谢

@dyllanwli 能不能了解一下你的配置环境呢?我这边测试了一下也没有办法复现哈,或者发邮件给我具体的环境,我试着复现一下先?

@rockingdingo 谢谢,我是tensorflow 1.2.1 的版本 之前1.0.0 也试了是同样的错误, 在Mac上使用
下面是输出的错误...

Traceback (most recent call last):
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1139, in _do_call
    return fn(*args)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1121, in _run_fn
    status, run_metadata)
  File "/Users/Dylan/anaconda/lib/python3.6/contextlib.py", line 89, in __exit__
    next(self.gen)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/kernel not found in checkpoint
	 [[Node: save/RestoreV2_3 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_3/tensor_names, save/RestoreV2_3/shape_and_slices)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/Dylan/Documents/GitHub/tempudabsynoim.py", line 10, in <module>
    tagger = ner_tagger.load_model(lang = 'zh') # Loading Chinese NER model
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/deepnlp/ner_tagger.py", line 100, in load_model
    return ModelLoader(lang, data_path, ckpt_path)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/deepnlp/ner_tagger.py", line 35, in __init__
    self.model = self._init_ner_model(self.session, self.ckpt_path)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/deepnlp/ner_tagger.py", line 62, in _init_ner_model
    tf.train.Saver(model_vars).restore(session, ckpt_path)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1548, in restore
    {self.saver_def.filename_tensor_name: save_path})
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 789, in run
    run_metadata_ptr)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 997, in _run
    feed_dict_string, options, run_metadata)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1132, in _do_run
    target_list, options, run_metadata)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/kernel not found in checkpoint
	 [[Node: save/RestoreV2_3 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_3/tensor_names, save/RestoreV2_3/shape_and_slices)]]

Caused by op 'save/RestoreV2_3', defined at:
  File "/Users/Dylan/Documents/GitHub/tempudabsynoim.py", line 10, in <module>
    tagger = ner_tagger.load_model(lang = 'zh') # Loading Chinese NER model
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/deepnlp/ner_tagger.py", line 100, in load_model
    return ModelLoader(lang, data_path, ckpt_path)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/deepnlp/ner_tagger.py", line 35, in __init__
    self.model = self._init_ner_model(self.session, self.ckpt_path)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/deepnlp/ner_tagger.py", line 62, in _init_ner_model
    tf.train.Saver(model_vars).restore(session, ckpt_path)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1139, in __init__
    self.build()
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1170, in build
    restore_sequentially=self._restore_sequentially)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 691, in build
    restore_sequentially, reshape)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 407, in _AddRestoreOps
    tensors = self.restore_op(filename_tensor, saveable, preferred_shard)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 247, in restore_op
    [spec.tensor.dtype])[0])
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/ops/gen_io_ops.py", line 640, in restore_v2
    dtypes=dtypes, name=name)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
    op_def=op_def)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2506, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/Users/Dylan/anaconda/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1269, in __init__
    self._traceback = _extract_stack()

NotFoundError (see above for traceback): Key ner_var_scope/ner_lstm/multi_rnn_cell/cell_0/basic_lstm_cell/kernel not found in checkpoint
	 [[Node: save/RestoreV2_3 = RestoreV2[dtypes=[DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/RestoreV2_3/tensor_names, save/RestoreV2_3/shape_and_slices)]]

@rockingdingo已经解决啦,非常感谢!

@michaelwangtd 请教一下,是咋解决的呢。。

@michaelwangtd 请教一下,是如何解决的,非常感谢

@michaelwangtd 请教一下,能不能讲一下详细的步骤?必重谢。

以ner为例,把file(ner_tagger.py)里面的
ckpt_path = os.path.join(pkg_path, "ner/ckpt", lang, "ner.ckpt")
改成:
ckpt_path = os.path.join(pkg_path, "ner/ckpt", lang, "ner.ckptner.ckpt.data-00000-of-00001") @lstmabc @yugenlgy @diyali03