/plastics-ghg-pipeline

Pipeline supporting the global plastics tool's GHG functionality.

Primary LanguagePythonOtherNOASSERTION

GHG Prep Pipeline

Luigi-based pipeline to sweep and select machine learning models which are used for the greenhouse gas emissions layer at https://global-plastics-tool.org/.


Purpose

Pipeline which executes pre-processing and model sweep / training before doing projections required by the GHG layer in https://global-plastics-tool.org/. Note that, unlike the larger pipeline repository, this only sweeps using ML methods before validating that error remains stable, reporting on the sweep for monitoring purposes.


Usage

Most users can simply reference the output from the latest execution. That output is written to https://global-plastics-tool.org/ghgpipeline.zip and is publicly available under the CC-BY-NC License. That said, users may also leverage a local environment if desired. For common developer operations including adding regions or updating data, see the cookbook.

Container Environment

A containerized Docker environment is available for execution. This will prepare outputs required for the front-end tool. See cookbook for more details.

Manual Environment

In addition to the Docker container, a manual environment can be established simply by running pip install -r requirements.txt. This assumes that sqlite3 is installed. Afterwards, simply run bash build.sh.

Configuration

The configuration for the Luigi pipeline can be modified by providing a custom json file. See task/job.json for an example. Note that the pipeline, by default, uses random forest even though a full sweep is conducted because that approach tends to yield better avoidance of overfitting.


Tool

Note that an interactive tool for this model is also available at https://github.com/SchmidtDSE/plastics-prototype.


Deployment

This pipeline can be deployed by merging to the deploy branch of the repository, firing GitHub actions. This will cause the pipeline output files to be written to https://global-plastics-tool.org/ghgpipeline.zip.


Local Environment

Setup the local environment with pip -r requirements.txt.


Testing

Some unit tests and other automated checks are available. The following is recommended:

$ pip install pycodestyle pyflakes nose2
$ pyflakes *.py
$ pycodestyle *.py
$ nose2

Note that unit tests and code quality checks are run in CI / CD.


Development Standards

CI / CD should be passing before merges to main which is used to stage pipeline deployments and deploy. Where possible, please follow the Google Python Style Guide. Please note that tests run as part of the pipeline itself and separate test files are not included. That said, developers should document which tasks are tests and expand these tests like typical unit tests as needed in the future. We allow lines to go to 100 characters. Please include docstrings where possible (optional for private members and tests, can assume dostrings are inherited).


Related Repositories

See also source code for the web-based tool running at global-plastics-tool.org and source code for "main" pipeline.


Data and Citation

This repository uses data from J. Zheng, S. Suh, Strategies to reduce the global carbon footprint of plastics. Nat. Clim. Chang. 9, 374–378 (2019). Our thanks for their contribution.


Open Source

This project is released as open source (BSD and CC-BY-NC). See LICENSE.md for further details. In addition to this, please note that this project uses the following open source:

The following are also potentially used as executables like from the command line but are not statically linked to code:

Uses derivative data from the base pipeline which includes data licensing including information about third party sources used.