/lt-en-sentences

Repository housing Latin-English sentence pairs to aid machine translation.

Primary LanguagePython

lt-en-sentences

This repository houses a dataset of Latin-English sentence pairs for in training a language translation model.

dataset.py

Running this script downloads and extracts the training data (from XML at Perseus Tufts GitHub repo at https://github.com/PerseusDL/dynamic-lexicon) into two text files containing parallel sentence lines for Latin and English. Individual parts of this can be specified with arguments:

  • Download data
  • Extract sentences