/CursWork2

Named Entity Recognition system

Primary LanguageHTML

CursWork2

Named Entity Recognition system Requies Tomita-parser (https://github.com/yandex/tomita-parser or https://tech.yandex.ru/tomita). Python scripts were optimized for python 3.4.

Abstract to paper

Named Entity Recognition (NER) is a key subtask of information extraction and NLP. Its purport is detection and labelling atomic pieces of text with tags from pre-defined set. This paper presents information about problem of proper entity recognition and describes main approaches to building systems which solve it. The study also includes implementation of NER system for using in Russian-language news articles. The distinctive feature of the algorithm is using information extracted from Wikipedia articles (i.e. consideration Wikipedia as the knowledge base) in combination with labelling based on using context-free grammars. In this connection the paper describes, inter alia, concept of knowledge base and its relation to information extraction tasks. Tomita-parser by Yandex was used as the tool for text parsing.