- This is an NER tools which can help you train better chinese ner model when facing noisy data.
- 本项目旨在提供一个中文带噪NER的训练工具箱。
NER任务作为NLP领域的一个基础任务,在神经网络大肆盛行的今天,似乎快要被人们遗忘了。 不可否认的是,BiLSTM-CRF已成为这类任务的标配,一般情况下,使用该模型能解决90%的问题。 但是想要轻松的应对这个领域的其他问题:如何解决数据存在噪声、数据量过少、实体嵌套、非连续实体、联合关系抽取等问题,似乎还未有定论。
本项目旨在帮助研究者或者开发者在面对数据质量问题时,提供一个简单易用的工具箱。
本项目实现以下几种相关方法:
NAME | PAPER | STATUS |
---|---|---|
BiLSTM/BiLSTM-CRF | (Baseline Model) (2016 NAACL) Architectures for Named Entity Recognition | Done |
Partical-CRF/Fuzzy-CRF | (2007 AAAI) Learning extractors from unlabeled text using relevant databases | Done |
MentorNet | (2018 ICML) MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels | Done |
Positive-Unlabeled Learning | (2019 ACL) Distantly Supervised Named Entity Recognition using Positive Unlabeled Learning | Coming Soon |
CrossWeigh | (2019 EMNLP) CrossWeigh Training Named Entity Tagger from Imperfect Annotations | Done |
Marginal Likelihood CRF | (2018 EMNLP) Marginal Likelihood Training of BiLSTM-CRF for Biomedical Named Entity Recognition from Disjoint Label Sets | Coming Soon |