/mtdata

A tool to help anyone build a mountain of public data.

Primary LanguagePythonBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Mt. Data

Continuous Integration Scheduled Update

A tool to help anyone build a mountain of public data.

Includes a simple Python script that can be configured to extract data from various sources. This is run periodically through GitHub actions and the updated data are committed back to the repository.

Datasets

Check out the README in the data/ directory for a list of data included in the repo from the bundled datasets.

Development

This project uses Pipenv. To install dependencies for development, use pipenv sync --dev.

Once everything is installed, the easiest way to work with the repo is to use pipenv shell to drop into the virtual environment.

To run the test suite, just run python -m pytest.

To run a specific scraper, run python -m mtdata -d dataset_name -s store-name

Dependencies

Add or update dependencies in Pipfile, then run ./tool/update-deps.sh. This will update the dependencies, create a new lock file, and then update the requirements.txt file used to build the Docker image.

Checks

Prior to making a pull request, run the various check scripts and fix any problems they identify.

  • ./tool/check-tests.sh
  • ./tool/check-types.sh
  • ./tool/check-format.sh
  • ./tool/check-lints.sh

If the formatter needs to be run, do this with ./tool/run-format.sh.

Documentation

When the API changes, the auto-generated documentation needs to be updated. To do this, run ./tool/update-docs.sh, then commit any changes that result.

Docker Image

The Docker image specified in Dockerfile bundles the code to make it easy to run Mt. Data in other environments. To build the container image based on the current version of the code, run ./tool/build-container. To push the container to Docker Hub, run ./tool/push-container.sh.