/NER-studies

Primary LanguageJupyter NotebookMIT LicenseMIT

NER classification

Named Entity Recognition (NER) is a critical task in natural language processing (NLP) that involves identifying and classifying entities within text, such as names of people, organizations, locations, dates, and more. This README provides an overview of using TensorFlow, a popular open-source machine learning framework, for developing NER models. NER with TensorFlow is essential for automating the extraction of structured information from unstructured text, making it a valuable tool for various NLP applications. In this guide, we will explore how to leverage TensorFlow's capabilities to create robust NER models and adapt them to specific domain requirements.

Table of Contents

Directory Structure

The project directory is organized as follows:

  • Makefile: Contains Makefile commands for project setup and data retrieval.
  • models/: Directory where pre-trained models are stored.
    • BiLSTM.h5: A pre-trained BiLSTM model.
  • notebooks/: Jupyter notebooks for exploring and using the project.
    • 01_BiLSTM_model.ipynb: A Jupyter notebook with a BiLSTM model.
  • poetry.lock: Lock file generated by Poetry for package management.
  • pyproject.toml: Poetry configuration file for managing project dependencies.
  • README.md: This file providing an overview of the project.
  • src/: Source code directory containing project code.
    • crflayer.py: Code for the CRF layer.
    • get_data.sh: Script for getting data (note that there is a typo in the Makefile command).
    • __init__.py: Python package initializer.
    • nermodel.py: Code related to NER model implementation.

Getting Started

Create env using poetry and set-up

make init 

Download the data and save inside a data folder

make get_data

Usage

This project can be used to study NER (Named Entity Recognition) using TensorFlow

Models

Inside a models folder, the BiLSTM model will be saved for future deployments or inferecens.