/nb-clean

Clean Jupyter notebooks of output for versioning

Primary LanguagePythonISC LicenseISC

nb-clean

nb-clean cleans Jupyter notebooks of cell execution counts, metadata, outputs, and (optionally) empty cells, preparing them for committing to version control. It provides a Git filter to automatically clean notebooks before they are staged, and can also be used as a standalone tool outside Git or with other version control systems. It can determine if a notebook is clean or not, which can be used as a check in your continuous integration pipelines.

Installation

To install the latest release from PyPI, use pip:

python3 -m pip install nb-clean

Alternately, in Python projects using Poetry or Pipenv for dependency management, add nb-clean as a development dependency with poetry add --dev nb-clean or pipenv install --dev nb-clean. nb-clean requires Python 3.6 or later.

Usage

Cleaning

To install a filter in an existing Git repository to automatically clean notebooks before they are staged, run the following from the working tree:

nb-clean configure-git

This will configure a filter to remove cell execution counts, metadata, and outputs. To also remove empty cells, use:

nb-clean configure-git --remove-empty

To preserve cell metadata, such as that required by tools such as papermill, use:

nb-clean configure-git --preserve-metadata

nb-clean will configure a filter in the Git repository in which it is run, and will not mutate your global or system Git configuration. To remove the filter, run:

nb-clean unconfigure-git

Aside from usage from a filter in a Git repository, you can also clean up a Jupyter notebook manually with:

nb-clean clean -i original.ipynb -o cleaned.ipynb

or by passing the notebook contents on stdin:

nb-clean clean < original.ipynb > cleaned.ipynb

To also remove empty cells, add the --remove-empty flag. To preserve cell metadata, add the --preserve-metadata flag.

Checking

You can check if a notebook is clean with:

nb-clean check -i notebook.ipynb

or by passing the notebook contents on stdin:

nb-clean check < notebook.ipynb

To also check for empty cells, add the --remove-empty flag. To ignore cell metadata, add the --preserve-metadata flag.

nb-clean will exit with status code 0 if the notebook is clean, and status code 1 if it is not. nb-clean will also print details of cell execution counts, metadata, outputs, and empty cells it finds.

Copyright

Copyright © 2017-2020 Scott Stevenson.

nb-clean is distributed under the terms of the ISC licence.