/a-studio

Lingtrain Alignment Studio is an ML based app for texts alignment on different languages. It can produce parallel corpora and parallel books.

Primary LanguageHTMLOtherNOASSERTION

Lingtrain Studio

asd

💡 Intro

Lingtrain Studio is the ML based app for accurate texts alignment on different languages.

  • Extracts parallel corpora from two texts.
  • Makes the formatted parallel book from it with sentence highlighting.

⚡ Articles

🧬 Models

Automated alignment process relies on the sentence embeddings models. Embeddings are multidimensional vectors of a special kind which are used to calculate a distance between the sentences. You can also plug your own model using the interface described in models directory. Supported languages list depends on the selected backend model.

  • distiluse-base-multilingual-cased-v2
    • more reliable and fast
    • moderate weights size — 500MB
    • supports 50+ languages
    • full list of supported languages can be found in this paper
  • LaBSE (Language-agnostic BERT Sentence Embedding)
    • can be used for rare languages
    • pretty heavy weights — 1.8GB
    • supports 100+ languages
    • full list of supported languages can be found here
  • SONAR (Sentence-level multimOdal and laNguage-Agnostic Representations)

💻 Running on local machine

You can run the application on your computer using docker. Make sure that docker is installed by typing the docker version command in your console.

docker-compose

  1. docker-compose build

  2. docker-compose up

Docker Hub

  1. Images configured to run locally are available on Docker Hub.

  2. Run the following commands in your console:

    • docker pull lingtrain/studio:v7.2
    • docker run -v C:\app\data:/app/data -v C:\app\img:/app/static/img -p 80:80 lingtrain/studio:v7.2
  3. App will be available in your browser on the localhost address.

  4. If you need to run the container on another port (e.g. localhost:8081):

    • Change the API_URL parameter in config.js
    • Rebuild the docker container
    • Start it with changed -p parameter (e.g. -p 8081:80)

🔨 Running in development mode

Clone this repo on your machine.

Backend

Flask/uwsgi backend REST API service. It contains all the alignment logic.

  • Go to the backend directory

    • cd /backend
  • Install the requirements

    • pip install -r requirements.txt
  • Run the backend application

    • python main.py

Frontend

SPA. Vue + vuex + vuetify. UI for managing alignment process using BE and a tool for translators to edit processing documents.

  • Go to the frontend directory

    • cd /frontend
  • Install the requirements

    • npm install -f
  • Compile and run with hot-reloads for development

    • npm run serve

Application will be available on localhost:8080

✉️ Feedback

You can create an issue or send me a message in telegram: @averkij

🔑 License

This work is licensed under a Attribution-NonCommercial-NoDerivatives 4.0 International license. See LICENSE.

Creative Commons License