yulinchen99/basic-supervised-ner

Python

Basic Supervised NER

It is the realization of the basic supervised NER model based on bert-cased model. The model contains a BERT encoder and a linear classifier.

Structure

utils_ner.py
- It contains the util function and class for data loading, specifically, it provides functions for two kinds of data types
  - read_cluener_example_from_file
    - for json datatype like cluener
  - read_examples_from_file
    - for standard two column data like CONLL, each row is "word label"
run_ner.py
- It is the main python script to run ner model
- important parameters
  - data_dir: the directory where training, dev, and test data are
  - labels: the file path that contains all labels
  - model_name_or_path: backbone model to use (bert-base-cased or bert-base-chinese)
  - output_dir: the directory to output training and test result
  - sample_type: which dataloading function to use ('conll' or 'cluener')
  - max_seq_length: max length
  - num_train_epochs
  - per_gpu_train_batch_size
  - seed: global random seed
  - do_train: whether to train
  - do_eval: whether to evaluate on dev set
  - do_predict: whether to predict on test set
  - overwrite_output_dir

Usage

Run pip install -r requirements
Refer to run.sh for training script. It is used to train on cluener dataset.
Make sure you have files named train.txt, dev.txt, test.txt in your data_dir when you use CONLL dataloading function
Make sure you have files named train.json, dev.json, test.json in your data_dir when you use cluener dataloading function