This repository houses a dataset of Latin-English sentence pairs for in training a language translation model.
Running this script downloads and extracts the training data (from XML at Perseus Tufts GitHub repo at https://github.com/PerseusDL/dynamic-lexicon) into two text files containing parallel sentence lines for Latin and English. Individual parts of this can be specified with arguments:
- Download data
- Extract sentences