Extendiing the dictionary
alepacheco opened this issue · 6 comments
hi, can you give some ideas of how I can increase the words contained in the dictionary to add more cities?
Thanks!
You can't add Cities in the dictionary for free. Well, you can, but if they are not in the training data they have no sense for the model. You can:
1- Change repeated cities in the dataset (not very good idea)
2- Try with some other model like including character embeddings (it solved it for me)
3- An other rudimentary idea is to substitute some cities in the dataset by the token UNK, so in the prediction time, when an unknown city appears, the model will think that it could be a city. Nevertheless, if other unknown words appear, the model could think that they are also cities no matter they are or not.
thanks for the info, do you have some resources on how to implement character embedding?
One more question, could I use a trained word2vector model with more words with this model?
Yes, of course, i've forgotten that! If you train it with pre-trained word vectors the problem is actually solved. I had to find other ways to solve it because this needs quite more memory, but if you don't have this problem it's perfect. For character embedding I followed: https://github.com/karpathy/char-rnn
@alepacheco here is discussion about pre-trained word2vec model, #6
@alepacheco Can i ask u why u tend to use character-level embedding instead of word embedding?