NNLP-IL/Hebrew-Resources

What is the current most accurate Hebrew word Embedding (word2vec) and/or Language model?

Closed this issue · 1 comments

There are several resources but couldn't find any benchmark so not sure what is best to use

Indeed there's no benchmark, but even if there was, it can't always tell what is the best resource to use for your use case.

In the case of embeddings, the vocabulary size and the size of the training set are very significant, but the closeness of the training set domain and the domain of your use case is usually more important, in my experience.

Regarding a language model, you can think of BERT is learning, or being, a language model itself, and not just as learned embeddings, so maybe multilingual BERT <https://github.com/google-research/bert/blob/master/multilingual.md>_, trained on Hebrew among many other languages, is one place to look.