英文的依存语法分析加载错误!
zsrainbow opened this issue · 1 comments
Describe the bug
在运行例子:https://github.com/hankcs/HanLP/blob/doc-zh/plugins/hanlp_demo/hanlp_demo/zh/dep_stl.ipynb,如果将语句换成:
dep = hanlp.load(hanlp.pretrained.dep.PTB_BIAFFINE_DEP_EN)
则会出现如下的错误:
Loading word2vec from cache ...Failed to load https://file.hankcs.com/hanlp/dep/ptb_dep_biaffine_20200101_174624.zip
If the problem still persists, please submit an issue to https://github.com/hankcs/HanLP/issues
When reporting an issue, make sure to paste the FULL ERROR LOG below.
================================ERROR LOG BEGINS================================
OS: Windows-10-10.0.22631-SP0
Python: 3.8.19
PyTorch: 2.1.2+cpu
TensorFlow: 2.13.0
HanLP: 2.1.0-beta.58
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp_init_.py", line 43, in load
return load_from_meta_file(save_dir, 'meta.json', verbose=verbose, **kwargs)
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\utils\component_util.py", line 186, in load_from_meta_file
raise e from None
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\utils\component_util.py", line 106, in load_from_meta_file
obj.load(save_dir, verbose=verbose, **kwargs)
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\common\keras_component.py", line 215, in load
self.build(**merge_dict(self.config, training=False, logger=logger, **kwargs, overwrite=True, inplace=True))
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\common\keras_component.py", line 225, in build
self.model = self.build_model(**merge_dict(self.config, training=kwargs.get('training', None),
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\components\parsers\biaffine_parser_tf.py", line 42, in build_model
pretrained: tf.keras.layers.Embedding = build_embedding(pretrained_embed, self.transform.form_vocab,
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\hanlp\layers\embeddings\util_tf.py", line 44, in build_embedding
layer: tf.keras.layers.Embedding = tf.keras.utils.deserialize_keras_object(embeddings)
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\keras\src\saving\serialization_lib.py", line 704, in deserialize_keras_object
instance = cls.from_config(inner_config)
File "C:\Users\Lenovo.conda\envs\HanLP\lib\site-packages\keras\src\engine\base_layer.py", line 870, in from_config
raise TypeError(
TypeError: Error when deserializing class 'Word2VecEmbeddingTF' using config={'trainable': False, 'embeddings_initializer': 'zero', 'filepath': 'https://nlp.stanford.edu/data/glove.6B.zip', 'expand_vocab': True, 'lowercase': False, 'unk': 'unk', 'normalize': True, 'name': 'glove.6B.100d', 'vocab': <hanlp.common.vocab_tf.VocabTF object at 0x0000025860082F10>}.
Exception encountered: C:\Users\Lenovo\AppData\Roaming\hanlp\thirdparty\nlp.stanford.edu\data/glove.6B
=================================ERROR LOG ENDS=================================
Code to reproduce the issue
import hanlp
dep = hanlp.load(hanlp.pretrained.dep.PTB_BIAFFINE_DEP_EN)
Describe the current behavior
无法加载英文的依存语法分析模型
Expected behavior
正常加载英文的依存语法分析模型
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Windows-10-10.0.22631-SP0
- Python version: 3.8.19
- HanLP version:2.1.0-beta.58
Other info / logs
注:初步分析,应该是预训练PTB_BIAFFINE_DEP_EN的序列化模型,在tensorflow中加载有误。
导致该错误出现的语句出现在:
hanlp\layers\embeddings\util_tf.py文件的44行,如下所示
layer: tf.keras.layers.Embedding = tf.keras.utils.deserialize_keras_object(embeddings)
上面的反序列化,如果是自定义模型,命名应该遵循要求的格式。可以参考:https://www.tensorflow.org/api_docs/python/tf/keras/utils/deserialize_keras_object
- I've completed this form and searched the web for solutions.