/dict-curation

Primary LanguagePythonMIT LicenseMIT

^Build status Build Status Documentation Status PyPI version

dict curation

A package for curating doc file collections. Prominent features:

  • Scrape texts off various sites, such as Wikisource. See example here. (PS: Consider contributing to raw_etexts repo. )
  • OCR some pdf with google drive. Automatically splits into 25 page bits and ocrs them individually. See usage example here, function here.

For users

Installation or upgrade:

  • sudo pip install dict_curation -U
  • sudo pip install git+https://github.com/sanskrit-coders/dict_curation/@master -U
  • Web.

For contributors

Contact

Have a problem or question? Please head to github.

Packaging

  • ~/.pypirc should have your pypi login credentials.
python setup.py bdist_wheel
twine upload dist/* --skip-existing

Build documentation

  • sphinx html docs can be generated with cd docs; make html

Testing

Run pytest in the root directory.

Auxiliary tools