/KoreaNER

Bi-LSTM - CRF Named Entity Recognition model for Korean (Keras)

Primary LanguagePython

KoreaNER / 한국어 개체명


Evaluation & Comparison:

Corpus: National Institute of Korean Language (ROK) - NER Corpus / 국립국어원 - 개체명 인식용 말뭉치 (Link)

Category KoNER/코너 (2016) Annie (2016) KoreaNER
Precision Recall F-Score Precision Recall F-Score Precision Recall F-Score
DT 0.894 0.880 0.887 0.6373 0.7785 0.7009 0.94 0.94 0.94
LC 0.793 0.853 0.822 0.5822 0.8782 0.7002 0.71 0.76 0.73
OG 0.824 0.772 0.797 0.7624 0.7087 0.7346 0.73 0.63 0.68
PS 0.915 0.885 0.899 0.8834 0.6127 0.7236 0.80 0.75 0.78
TI 0.872 0.810 0.840 0.5441 0.8810 0.6727 0.98 0.89 0.93

source


Future improvements:

  • Add Gazeteer
  • Add specific features for PS/LOC
  • Web API

References:

Character-Aware Neural Language Models

Boosting Named Entity Recognition with Neural Character Embeddings

Attending To Characters In Neural Sequence Labeling Models

Neural Architectures for Named Entity Recognition

Bidirectional LSTM-CRF Models for Sequence Tagging

Character-level Convolutional Networks for Text Classification

A Syllable-based Technique for Word Embeddings of Korean Words

Open source projects (Github):

CharCNN Pytorch

word2vec-keras-in-gensim

anago

annie

DeepSequenceClassification

autumn_ner

kchar

deep-named-entity-recognition

Sequence Tagging with Tensorflow