ai-forever/ner-bert

how to add some new feature?

EricAugust opened this issue · 4 comments

I want to add some feature in data, ex: is_in_some_vocab?
To train more generalized model.
How can I do this?

Now we add from_config to data and models. (but without examples for now). What do you mean then u say "more generalized model".

What about vocab - :( we release saving labels vocabs in next month. Text vocab is the BERT vocab_file.

You know, in some situation, maybe a word predict as location, which actually is person name, or vice versa. So maybe I can keep a vocabulary to save location or person name or organization name.
Also I can use other feature, ex: pos tagging, or other manual defined feature.
After I do that, in the test process, perhaps there are low chances to predict wrong.

I have another question, In predict, I need create data, model, learner, then load pre-trained model.
But, after I have trained the model, I don't need to create data, model etc. I only want to loads model, process data, then predict the sequence label.

U are right. Before now we use this code only for experiments. We will add this functions in next month. That about meta (additional) information of words or sentences. U can add your own vector with such info:
data = NerData.create(train_path, valid_path, vocab_file, is_cls=False, is_meta=True)
model = BertBiLSTMAttnCRF.create(len(data.label2idx), bert_config_file, init_checkpoint_pt, meta_dim=30)
meta_dim - is the dimension of your additional information. U can encode POS tags with OneHot (but we know that this is bad). We will add embedder for meta soon.

We do release in next month with new features (meta info, different schemas (BIO, IOX - as in BERT)), easy predict and so on.