This repository contain Python notebook for Khmer Language Model using ULMFiT. We use Khmer Wikipedia as our training data long with 1000 articles from Khmer news from the segmentation-crf-khmer repository.
We save the Wiki data files and the segmented output file. The pre-trained model is not in this repository due to the size. But the notebook contain code to download from a Google drive.
See detail write up here:
https://medium.com/@phylypo/khmer-language-model-using-ulmfit-b0f8ca4e15be
We created a web interface where you can test out the model's next words prediction. See: