intellicall: A C++ repository from nxtreaming

RTranslator is an (almost) open-source, free, and offline real-time translation app for Android.

Connect to someone who has the app, connect Bluetooth headphones, put the phone in your pocket and you can have a conversation as if the other person spoke your language.

Conversation mode

The Conversation mode is the main feature of RTranslator. In this mode, you can connect with another phone that uses this app. If the user accepts your connection request:

When you talk, your phone (or the Bluetooth headset, if connected) will capture the audio.
The audio captured will be converted into text and sent to the interlocutor's phone.
The interlocutors' phone will translate the text received into his language.
The interlocutors' phone will convert the translated text into audio and will reproduce it from its speaker (or by the Bluetooth headset of the interlocutor if connected to his phone).

All this in both directions.

Each user can have more than one connected phone so that you can translate conversations between more than two people and in any combination.

WalkieTalkie mode

If conversation mode is useful for having a long conversation with someone, this mode instead is designed for quick conversations, such as asking for information on the street or talking to a shop assistant.

This mode only translates conversations between two people, it doesn't work with Bluetooth headsets, and you have to talk in turns. It's not a real simultaneous translation, but it can work with only one phone.

In this mode, the smartphone microphone will listen in two languages (selectable in the same screen of the walkie talkie mode) simultaneously.
The app will detect in which language the interlocutor is speaking, translate the audio into the other language, convert the text into audio, and then reproduce it from the phone speaker. When the TTS has finished, it will automatically resume listening.

Text translation mode

This mode is just a classic text translator, but always useful.

General

RTranslator uses Meta's NLLB for translation and OpenAi's Whisper for speech recognition, both are (almost) open-source and state of the art AIs, have excellent quality and run directly on the phone, ensuring absolute privacy and the possibility of using RTranslator even offline without loss of quality.

Also, RTranslator works even in the background, with the phone on standby or when using other apps (only when you use Conversation or WalkieTalkie modes). However, some phones limit the power in the background so in that case it is better to avoid it and keep the app open with the screen on.

What's new in version 2.1

New GUI! Designed by Chiara Chindamo.
Added speak and copy buttons to the text translation mode.
Added the option to manually control the mics in WalkieTalkie mode.
Added the option to use low-quality languages.
Fixed some bugs.

For the full list of changes see here.

Performance requirements

I have optimized the AI models a lot to minimize RAM consumption and execution time, despite this however to be able to use the app without the risk of crashing you need a phone with at least 6GB of RAM, and to have a good enough execution time you need a phone with a fast enough CPU.

If you have a pretty crappy phone (or if you want maximum speed) you can always use version 1.0 of RTranslator (but since it uses Google APIs it's not free and needs some initial setup).

Download

To install the app, download the latest version of the app apk file from https://github.com/niedev/RTranslator/releases/ and install it (ignore the other files, those will be downloaded automatically by the app on the first start).

On the first launch, RTranslator will automatically download the models for translation and speech recognition (1.2GB) and once done you can start translating.

The initial download will get the models from GitHub, however in some regions GitHub is very slow, those who have problems of this kind can download the models separately from a computer (or in general in whatever way they prefer) and insert them manually into the app following this guide.

If you have a GitHub account and want to be notified when a new release comes out you can do so by clicking, at the top of the page, on "Watch" -> "Custom" -> "Releases" -> "Apply".

Supported languages

The languages supported are as follows:

Arabic, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, Galician, German, Greek, Italian, Japanese, Korean, Macedonian, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese.

If your language is not on the list, from version 2.1 of RTranslator you can go into the settings and enable "Support low quality languages" to add the following languages (which have lower quality for translation and speech recognition):

Afrikaans, Akan (only text), Amharic, Assamese, Bambara (only text), Bangla, Bashkir, Basque, Belarusian, Bosnian, Dzongkha (only text), Esperanto (only text), Estonian, Ewe (only text), Faroese, Fijian (only text), Georgian, Guarani (only text), Gujarati, Hausa, Hebrew, Hindi, Hungarian, Javanese (only text), Kannada, Kashmiri (only text), Kazakh, Kikuyu (only text), Kinyarwanda (only text), Korean, Kyrgyz (only text), Lao, Limburghish (only text), Lingala, Lithuanian, Luxembourghish, Macedonian, Tagalog (only text), Tibetan.

Text To Speech

To speak, RTranslator uses the system TTS of your phone, so the quality of the latter and the supported languages depend on the system TTS of your phone.

The supported languages seen above are all compatible with Google TTS, which is the recommended TTS (although you can use the TTS you want).

To change the system TTS (and therefore the TTS used by RTranslator), download the TTS you want to use from the Play Store, or from the source you prefer, and open RTranslator, then open its settings (top right) and, in the "Output" section, click on "Text to Speech", at this point the system settings will open in the section where you can select the preferred system TTS engine (among those installed), at this point, if you have changed the preferred engine, restart RTranslator to apply the changes.

Privacy

Privacy is a fundamental right. That's why RTranslator does not collect any personal data (I don't even have a server). For more information, read the privacy policy (for now is the same privacy policy of RTranslator 1.0, but I will update it in the future).

Libraries and models

RTranslator code is completely open-source, but some of the external libraries it uses have less permissive licenses, these are all the external libraries used by the app (with the indication of their license):

BluetoothCommunicator (open-source): Used for Bluetooth LE communication between devices.

GalleryImageSelector (open-source): Used for selecting and cropping the profile image from the gallery.

OnnxRuntime (open-source): Used as an accelerator engine for the AI models.

SentencePiece (open-source): Used for tokenization of the input text for NLLB.

Ml Kit (closed-source): Used for the identification of the language in the WalkieTalkie mode.

And the following AI models:

NLLB (open-source, but only for non-commercial use): The model used is NLLB-Distilled-600M with KV cache.

Whisper (open-source): The model used is Whisper-Small-244M with KV cache.

Performance of the models

I converted both NLLB and Whisper to onnx format and quantized them in int8 (excluding some weights to ensure almost zero quality loss), I also separated some parts of the models to reduce RAM consumption (without this separation some weights were duplicated at runtime consuming more RAM than expected) and done other optimizations to reduce execution time.

Here are the results of my optimizations:

	normal NLLB onnx model (full int8, no kv-cache)	RTranslator NLLB onnx model (partial int8, with kv-cache, separated parts)
RAM Consumption	2.5 GB	1.3GB (1.9x improvement)
Execution time for 75 tokens	8s	2s (4x improvement)

	Whisper onnx model optimized with Olive (full int8, with kv-cache)	RTranslator Whisper onnx model (partial int8, with kv-cache, separated parts)
RAM Consumption	1.4 GB	0.9 GB (1.5x improvement)
Execution time for 11s audio	1.9s	1.6s (1.2x improvement)

N.B. RTranslator Whisper model can also consume 0.5 GB of RAM but with an execution time of 2.1s (this mode is used for phones with less than 8 GB of RAM)

Donations

This is an open-source and completely ad-free app, I don't make any money from it.

So, if you like the app and want to say thank you and support the project, you can make a donation via PayPal by clicking on the button below (any amount is well accepted).

In case you will donate, or just live a star, thank you ❤️

Bugs and problems

I remind you that the app is still in beta. The bugs found are the following:

Sometimes the Bluetooth connection drops.

If you have found any bug please report it by opening an issue, or by writing an email to contact.niedev@gmail.com

Enjoy your simultaneous translator.

nxtreaming/intellicall