/khmer-language-model-ulmfit

Khmer Language Model using ULMFiT

Primary LanguageJupyter Notebook

khmer-language-model-ulmfit

This repository contain Python notebook for Khmer Language Model using ULMFiT. We use Khmer Wikipedia as our training data long with 1000 articles from Khmer news from the segmentation-crf-khmer repository.

We save the Wiki data files and the segmented output file. The pre-trained model is not in this repository due to the size. But the notebook contain code to download from a Google drive.

See detail write up here:

https://medium.com/@phylypo/khmer-language-model-using-ulmfit-b0f8ca4e15be

We created a web interface where you can test out the model's next words prediction. See:

http://ml.tovnah.com/khmer-ulmfit/.