Description

This project is a next-word-prediction task in Uzbek language using preprocessing text data and training multilayer recurrent neural network.

Steps

  • This data is not good so far,the team is currently working on the cleaning phase of the Uzbek dataset
  • Then data is preprocesed in data_preprocess.ipynb. After that vectors of data and labels are saved.
  • Then in train_model.ipynb saved data is loaded and a multilayer recurrent neural network using LSTM layers is trained. After 10 epochs accuracy in training data is 0.923 and change of accuracy and loss is ploted:
accuracy loss
  • Then in test_model.ipynb saved model is loaded and model is tested to predict next word:

How to Improve

  • The most important problem of this project is not using big and appropriate primary data. Accuracy improves dramatically if you use more appropriate primary data.
  • Different parameters as well as different layers can be tested for the neural network model and the accuracy can be improved.
  • You can also contribute to data cleaning by using the following link Google Doc