/word2vec-spark

sprark word2vec wordembedding embedding

Primary LanguageScala

spark word2vec

train word2vec on spark and save as text file(google word2vec format ) 使用spark训练word2vec,由于spark保存的模型只能在spark上使用,本工程将spark训练好的wordvec转换成google word2vec的文本形式(word vector)

train

   spark-submit.sh -input hdfs_corpus -output hdfs_word2vec_model

display

   python w2v_visualizer.py word2vec.model ./log_result/
   tensorboard --logdir ./log_result/
   

result

image image image image