/tg-bot-translation

Telegram bot interface for translation engine based on transformer architecture

Primary LanguageJupyter Notebook

tg-bot-translation

Bot performs sentence translation from Russian to English using Transformer. Dataset for training consists of 400k sentence pairs. 50k sentence pairs were taken from YSDA NLP course data. They are mainly connected with various apartments descriptions. Another 350k sentence pairs are from Yandex parallel corpus. These pairs are general purpose sentences. As a result, bot best translates sentences related to apartments and hotels descriptions.

Transformer implementation from DLS school seminar was used. Number of encoder and decoder layers was chosen to be four. Gensim Word2vec embeddings were initially trained on dataset. The model studied for 47 epochs. It reached 19.66 BLEU score on test set. During the last 15 epochs BLEU score increased by just 0.3 points which indicates that further training is unlikely to improve inference results. Actual notebook used for model training can be found here.

Telegram bot is written using AIOGram asynchronous framework. It is located in bot folder. You should change paths in config.py file in order to run it on your system. Do not forget to add your Telegram token as well. Main entry point for bot initialization is main.py file. Bot name is @translate_tr_bot