lang-uk/ner-uk

Train selected models on our data.

dchaplinsky opened this issue · 1 comments

While some models enlisted in #11 are already trained on the ner-uk data there are still room for improvement:

  • Our corpus was recently updated to fix some issues with tagging and train/test split
  • As far as I understand, some models are relying on vectors and using some basic vectors trained on common crawl/wikipedia/etc. We might use better vectors made on top of our own corpora, please contact me, if you need them
  • I'd like to have a separate discussion if we should train on full data (train+test) rather than train test only for the purpose of ensemble model. @mariana-scorp might have additional comments on that
gawy commented

Training scripts for mitie and stanza are in MR #16