
Russian-Belarusian neural translator


Russian-Belarusian neural translator

The data is a part of my bachelor thesis about neural translation for the language pair Russian-Belarusian.


The repo consists of

  • 429k aligned sentence pairs (under Data/AlignedData), split into 10 batches

  • chunks to align (under Data/ChunksToAlign)

  • Data/TabbedCorpusMiddleSent.txt is a sample of 65966 sentences, at max 80 characters each, and is handy to train a model only on a sample of data.

  • neural network code.

Data source

? The main source of the data (web-pages,..)


? How the data was collected

This is an open-source project, data can be used freely. Any reviews are much than welcome.

Author: Tsimafei Prakapenka