/KyrgyzNER

Welcome to our Named-Entity Recognition (NER) project for the Kyrgyz language! This repository aims to provide an efficient and accurate solution for identifying named entities within Kyrgyz texts.

KyrgyzNER

Welcome to our Named-Entity Recognition (NER) project for the Kyrgyz language! This repository aims to provide an efficient and accurate solution for identifying named entities within Kyrgyz texts.

Usage

The dataset and code will be released soon!

Model on HuggingFace

What is Named-Entity Recognition?

Named-Entity Recognition is a crucial natural language processing (NLP) task that involves identifying and classifying entities such as names of people, locations, organizations, dates, and more within a given text. This process plays a vital role in various NLP applications like information retrieval, sentiment analysis, and machine translation.

Our Goal

The primary objective of this project is to develop golden NER dataset for Kyrgyz, which can precisely recognize and categorize named entities present in Kyrgyz texts. By creating a manually annotated NER dataset for Kyrgyz, we aim to enhance the performance of various Kyrgyz NLP applications and foster the growth of the Kyrgyz language technology.

Key Features

  • Data Collection: The dataset contains 1500 news articles from 24.kg. This dataset serves as the foundation for training and evaluating our NER model. The annotated texts have been saved to files in a format similar to CoNLL2003.
  • Preprocessing: Texts in Kyrgyz often come with specific challenges, including complex morphology and diverse linguistic structures. Our preprocessing pipeline handles these intricacies to ensure the best possible model performance.
  • Model Selection: We explore and experiment with various state-of-the-art NER architectures and techniques to identify the most suitable model for Kyrgyz.
  • Evaluation: Our model's performance is meticulously evaluated through standard metrics, cross-validation techniques, and comparisons with existing NER systems to showcase its effectiveness and advancement over baseline models.

Contribution

We express our deep gratitude to all volunteers of the project and students of KSTU, who helped us with labeling and annotating all texts. A list of contributors can be found here. Also we thank Dr.. Gulnara Kabaeva and Dr. Gulira Zhumalieva for their support in the development of this project.

We welcome contributions from NLP community and researchers interested in advancing the field. Feel free to raise issues, submit pull requests, or provide feedback on how we can improve the model and its implementation.