Lingtrain Studio

💡 Intro

Lingtrain Studio is the ML based app for accurate texts alignment on different languages.

Extracts parallel corpora from two texts.
Makes the formatted parallel book from it with sentence highlighting.

⚡ Articles

🧬 Models

Automated alignment process relies on the sentence embeddings models. Embeddings are multidimensional vectors of a special kind which are used to calculate a distance between the sentences. You can also plug your own model using the interface described in models directory. Supported languages list depends on the selected backend model.

distiluse-base-multilingual-cased-v2
- more reliable and fast
- moderate weights size — 500MB
- supports 50+ languages
- full list of supported languages can be found in this paper
LaBSE (Language-agnostic BERT Sentence Embedding)
- can be used for rare languages
- pretty heavy weights — 1.8GB
- supports 100+ languages
- full list of supported languages can be found here
SONAR (Sentence-level multimOdal and laNguage-Agnostic Representations)
- Supports about 200 languages (approximately these)
- A large model (3 GB of weights)
- Ideally, requires you to indicate the source language explicitly
- Was originally released at facebookresearch/SONAR based on fairseq2, but here uses a HuggingFace port.

💻 Running on local machine

You can run the application on your computer using docker. Make sure that docker is installed by typing the docker version command in your console.

docker-compose

docker-compose build
docker-compose up

Docker Hub

Images configured to run locally are available on Docker Hub.
Run the following commands in your console:
- docker pull lingtrain/studio:v7.2
- docker run -v C:\app\data:/app/data -v C:\app\img:/app/static/img -p 80:80 lingtrain/studio:v7.2
App will be available in your browser on the localhost address.
If you need to run the container on another port (e.g. localhost:8081):
- Change the API_URL parameter in config.js
- Rebuild the docker container
- Start it with changed -p parameter (e.g. -p 8081:80)

🔨 Running in development mode

Clone this repo on your machine.

Backend

Flask/uwsgi backend REST API service. It contains all the alignment logic.

Go to the backend directory
- cd /backend
Install the requirements
- pip install -r requirements.txt
Run the backend application
- python main.py

Frontend

SPA. Vue + vuex + vuetify. UI for managing alignment process using BE and a tool for translators to edit processing documents.

Go to the frontend directory
- cd /frontend
Install the requirements
- npm install -f
Compile and run with hot-reloads for development
- npm run serve

Application will be available on localhost:8080

✉️ Feedback

You can create an issue or send me a message in telegram: @averkij

🔑 License

This work is licensed under a Attribution-NonCommercial-NoDerivatives 4.0 International license. See LICENSE.

averkij/a-studio