It is the realization of the basic supervised NER model based on bert-cased model. The model contains a BERT encoder and a linear classifier.
- utils_ner.py
- It contains the util function and class for data loading, specifically, it provides functions for two kinds of data types
read_cluener_example_from_file
- for json datatype like
cluener
- for json datatype like
read_examples_from_file
- for standard two column data like
CONLL
, each row is "word label"
- for standard two column data like
- It contains the util function and class for data loading, specifically, it provides functions for two kinds of data types
- run_ner.py
- It is the main python script to run ner model
- important parameters
- data_dir: the directory where training, dev, and test data are
- labels: the file path that contains all labels
- model_name_or_path: backbone model to use (bert-base-cased or bert-base-chinese)
- output_dir: the directory to output training and test result
- sample_type: which dataloading function to use ('conll' or 'cluener')
- max_seq_length: max length
- num_train_epochs
- per_gpu_train_batch_size
- seed: global random seed
- do_train: whether to train
- do_eval: whether to evaluate on dev set
- do_predict: whether to predict on test set
- overwrite_output_dir
- Run
pip install -r requirements
- Refer to
run.sh
for training script. It is used to train on cluener dataset. - Make sure you have files named
train.txt
,dev.txt
,test.txt
in yourdata_dir
when you use CONLL dataloading function - Make sure you have files named
train.json
,dev.json
,test.json
in yourdata_dir
when you use cluener dataloading function