Luigi-based pipeline to sweep and select machine learning models which are used for the greenhouse gas emissions layer at https://global-plastics-tool.org/.
Pipeline which executes pre-processing and model sweep / training before doing projections required by the GHG layer in https://global-plastics-tool.org/. Note that, unlike the larger pipeline repository, this only sweeps using ML methods before validating that error remains stable, reporting on the sweep for monitoring purposes.
Most users can simply reference the output from the latest execution. That output is written to https://global-plastics-tool.org/ghgpipeline.zip and is publicly available under the CC-BY-NC License. That said, users may also leverage a local environment if desired. For common developer operations including adding regions or updating data, see the cookbook.
A containerized Docker environment is available for execution. This will prepare outputs required for the front-end tool. See cookbook for more details.
In addition to the Docker container, a manual environment can be established simply by running pip install -r requirements.txt
. This assumes that sqlite3 is installed. Afterwards, simply run bash build.sh
.
The configuration for the Luigi pipeline can be modified by providing a custom json file. See task/job.json
for an example. Note that the pipeline, by default, uses random forest even though a full sweep is conducted because that approach tends to yield better avoidance of overfitting.
Note that an interactive tool for this model is also available at https://github.com/SchmidtDSE/plastics-prototype.
This pipeline can be deployed by merging to the deploy
branch of the repository, firing GitHub actions. This will cause the pipeline output files to be written to https://global-plastics-tool.org/ghgpipeline.zip.
Setup the local environment with pip -r requirements.txt
.
Some unit tests and other automated checks are available. The following is recommended:
$ pip install pycodestyle pyflakes nose2
$ pyflakes *.py
$ pycodestyle *.py
$ nose2
Note that unit tests and code quality checks are run in CI / CD.
CI / CD should be passing before merges to main
which is used to stage pipeline deployments and deploy
. Where possible, please follow the Google Python Style Guide. Please note that tests run as part of the pipeline itself and separate test files are not included. That said, developers should document which tasks are tests and expand these tests like typical unit tests as needed in the future. We allow lines to go to 100 characters. Please include docstrings where possible (optional for private members and tests, can assume dostrings are inherited).
See also source code for the web-based tool running at global-plastics-tool.org and source code for "main" pipeline.
This repository uses data from J. Zheng, S. Suh, Strategies to reduce the global carbon footprint of plastics. Nat. Clim. Chang. 9, 374–378 (2019). Our thanks for their contribution.
This project is released as open source (BSD and CC-BY-NC). See LICENSE.md for further details. In addition to this, please note that this project uses the following open source:
- Luigi under the Apache v2 License.
- onnx under the Apache v2 License.
- scikit-learn under the BSD License.
- sklearn-onnx under the Apache v2 License.
The following are also potentially used as executables like from the command line but are not statically linked to code:
- Docker under the Apache v2 License.
- Python 3.8 under the PSF License.
Uses derivative data from the base pipeline which includes data licensing including information about third party sources used.