Web scraping utilties for DWR, USACE, USGS, CNRFC, CVO and SacALERT data repositories.
Create a virtual environment with Python 3's built-in venv
library.
$ python -m venv ~/.virtualenvs/myenv
Activate with
$ myenv\Scripts\activate.bat
(Windows)
or $ source myenv/bin/activate
(MacOS).
- pyenv
- virtualenvwrapper
$ git clone https://github.com/MBKEngineers/collect.git
Use the "editable" flag (-e) flag to make sure changes in your repo are propagated to any use of your virtualenv.
$ cd collect
$ python -m pip install -e .
If you plan to use the collect.cvo
module which depends on tabula-py
, you will need to install Java. Follow the instructions at: https://tabula-py.readthedocs.io/en/latest/getting_started.html
Add username and password credentials to a .env
file to enable downloading data from password-protected sources.
The collect
module uses Sphinx to generate documentation from doc-strings in the project. To update and access documentation files, make sure that Sphinx is installed:
$ python -m pip install -e ".[docs]"
Note, there is one other Python package on PyPi named collect
. However, it is not maintained and is dated 2011, so not expecting MBK codebase to use that tool.
collect
now includes a command line interface for starting a new module called collect-start
. Initialize a new module from a template with
$ collect-start modulename