Provide vocabulary file
naiaden opened this issue · 1 comments
naiaden commented
I would like to have a feature which allows me to limit the classes to a certain vocabulary. If you want to reproduce experiments by others, often you are given a vocabulary as well. Right now there is not a trivial way to limit the words to a certain vocabulary, without sacrificing efficiency in the encoding.
What I want is to give a vocabulary as parameter, and that the class file is limited to the words found in the vocabulary. The other words are mapped to OOV.
proycon commented
- implement top-x classes as well, pruning tail of class encoding
- ensure pattern model training properly ignores patterns with OOV