bmschmidt/wordVectors

Does it accept Arabic (or any non-ASCII) in general?

Opened this issue · 0 comments

I'm facing difficulty in vectorizing an Arabic text, I don't seem to be able of getting anything useful.

The word2vec function is only extracting funny characters (like emojis and so on) from a text file of about 200k Arabic words.. it seems also to convert these characters to codepoint values.

I would like to have nice an normal looking word2vec for my Arabic text.

Any comments or workarounds?