/sentencer

Sentence extractor with vocabulary filter

Primary LanguagePython

Sentencer

A program to extract translatable sentences from a corpus, based on a known vocabulary.

The vocabulary is stored in a CSV file.

Corpus resources

Getting started

(NB. You might want to set up a virtualenv first)

pip install -r requirements.txt
pip install -r requirements_dev.txt
python scripts/nltk_download.py

Run the tests:

flake8
pytest

Run the program on a sample corpus:

python sentencer/main.py my-day