Fine-tuning bert-base-multilingual-cased on Wikiann dataset for performing NER on Bengali language.
Model available on Huggingface Model Hub at https://huggingface.co/Suchandra/bengali_language_NER
- Model checkpoint initially selected as mbert uncased
- Using mbert uncased, the words passed to encode and the output from decode doesn't match (changed spelling)
- This is due to normalization issues in mbert uncased, so the correct one is mbert cased
refer https://github.com/google-research/bert/blob/master/multilingual.md
Name | Overall F1 | LOC F1 | ORG F1 | PER F1 |
---|---|---|---|---|
Train set | 0.997927 | 0.998246 | 0.996613 | 0.998769 |
Validation set | 0.970187 | 0.969212 | 0.956831 | 0.982079 |
Test set | 0.9673011 | 0.967120 | 0.963614 | 0.970938 |