/deepl_align

Python script that uses the DeepL API to create a fully-aligned bilingual document (tmx or docx file).

Primary LanguagePython

DeepL Align

A command line script for translating a bilingual file using the DeepL API.
The translation is output as a tmx file or as a docx file table with the source and target text segments fully aligned.

Built using:

  • Python 3.10
  • python-docx 0.8.11
  • deepl 1.9.0
  • environs 9.5.0
  • pytest 7.1.2

Tmx output example:

tmx-recording.mov

Docx output example:

docx-recording.mov

To download and run:

  1. Go to the DeepL website and sign up to use the DeepL API. At the time of writing, it is possible to sign up and use the API to translate up to 500,000 characters a month for free. You will receive an authentication key. Keep this safe as you will need it later.

Then, using the terminal:

  1. Clone this repo.
    git clone https://github.com/4ka0/deepl_align.git

  2. Move into the project folder.
    cd deepl_align

  3. Create and activate a virtual environment.
    (Example using venv:)
    python3 -m venv venv
    source venv/bin/activate

  4. Update pip (package manager).
    python -m pip install --upgrade pip

  5. Install the dependencies.
    python -m pip install -r requirements.txt

  6. In the root directory of the project, create a file called .env.
    touch .env
    In the .env file, write the following line.
    export AUTH_KEY=(your DeepL authentication key)
    Note there should be no space after the equals sign.
    And replace "(your DeepL authentication key)" with your actual key

  7. Run the script.
    To output a docx file:
    python translate.py docx source-text.docx
    To output a tmx file:
    python translate.py tmx source-text.docx
    To provide DeepL with a glossary of preferred terms to be used in the translation:
    python translate.py tmx source-text.docx glossary.txt
    Note that the glossary should be a tab-delimited text file having the following format on each line.
    source-term<tab>target-term
    (Replace <tab> with an actual tab character.)