hankcs/HanLP

phraseTree引发的import error

oasis-0927 opened this issue · 2 comments

Describe the bug
python3.9+中将cgi.escape 移除,修改为html.escape ,新版本的nltk库中已经进行修改,但是由于本项目引用的是没有进行相关修改的phraseTree,因此在python 3.9+的环境中使用pretty_print方法会报错。

是否可以尝试将phraseTree都统一替换为nltk.tree 来解决此问题。

Code to reproduce the issue

import hanlp
from hanlp_common.document import Document


def merge_pos_into_con(doc: Document):
	flat = isinstancse(doc['pos'][0], str)
	if flat:
		doc = Document((k, [v]) for k, v in doc.items())
	for tree, tags in zip(doc['con'], doc['pos']):
		offset = 0
		for subtree in tree.subtrees(lambda t: t.height() == 2):
			tag = subtree.label()
			if tag == '_':
				subtree.set_label(tags[offset])
			offset += 1
	if flat:
		doc = doc.squeeze()
	return doc


con = hanlp.load('CTB9_CON_FULL_TAG_ELECTRA_SMALL')
tok = hanlp.load(hanlp.pretrained.tok.COARSE_ELECTRA_SMALL_ZH)
pos = hanlp.load(hanlp.pretrained.pos.CTB9_POS_ELECTRA_SMALL)
nlp = hanlp.pipeline().append(pos, input_key='tok', output_key='pos') \
	.append(con, input_key='tok', output_key='con')
doc = nlp(tok=["2021年", "HanLPv2.1", "带来", "最", "先进", "的", "多", "语种", "NLP", "技术", "。"])['con']
doc.pretty_print()

Describe the current behavior
A clear and concise description of what happened.

Expected behavior
A clear and concise description of what you expected to happen.

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
  • Python version:3.10
  • HanLP version:2.1.0b56

Other info / logs
Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
315947501-49fdf6aa-4e0c-4892-aff0-692cf2a61a4a

    • I've completed this form and searched the web for solutions.
hankcs commented

感谢反馈,已经修复,请检查上面的commit是否解决了这个问题。
如果还有问题,欢迎重开issue。

phrasetree有序列化的功能,而且更轻量化。

测试已修复,感谢。