airiuz/next-word-prediction

Jupyter Notebook

Description

This project is a next-word-prediction task in Uzbek language using preprocessing text data and training multilayer recurrent neural network.

Steps

This data is not good so far,the team is currently working on the cleaning phase of the Uzbek dataset
Then data is preprocesed in data_preprocess.ipynb. After that vectors of data and labels are saved.
Then in train_model.ipynb saved data is loaded and a multilayer recurrent neural network using LSTM layers is trained. After 10 epochs accuracy in training data is 0.923 and change of accuracy and loss is ploted:

accuracy	loss

Then in test_model.ipynb saved model is loaded and model is tested to predict next word:

How to Improve

The most important problem of this project is not using big and appropriate primary data. Accuracy improves dramatically if you use more appropriate primary data.
Different parameters as well as different layers can be tested for the neural network model and the accuracy can be improved.
You can also contribute to data cleaning by using the following link Google Doc