ELMo-chinese: A Python repository from earlzz

Deep contextualized word representations 中文汉语

只是输出 context-independent 的word embedding

python3

tensorflow >= 1.10

jieba

1, 准备数据，参考data文件夹和vocab文件夹，可用pre_data下的vocab.py处理出词典（每个data文件不要太大，否则内存不够）

2, 训练模型 train_elmo.py

3, 输出模型 dump_weights.py

4, 把options.json里的261改成262

5, 输出word embedding到hdf5文件 usage_token.py

用可视化工具看合理

textmatch任务提升AUC 1-2

MIT.

earlzz/ELMo-chinese