/robodoc

Finding documents that use proprietary formats on UK government websites

Primary LanguagePythonMIT LicenseMIT

RoboDoc

This is the software for the RoboDoc project. Its purpose is to crawl UK government websites and make a list of documents that have proprietary file formats.

Suggestions, contributions or bug reports are very welcome. Please open a new issue on GitHub or send a pull request.

Installing and running the RoboDoc software

  • Make sure Python 3 is installed.
  • Make sure Scrapy is installed.
  • Create a virtual environment: virtualenv --python=python3 venv
  • Activate virtual environment: source venv/bin/activate
  • Install via git: pip install -e git+https://github.com/tlocke/robodoc.git
  • Run with: ./run.sh

Running Tests

  • Install pytest: pip install pytest
  • Run the tests: pytest test_robodoc.py