/create-jupyter-git

A CLI command that generates a fresh Git Repository with files and configs to optimize version control for Jupyter Notebooks

Primary LanguagePython

Create Jupyter Git

A CLI command that generates a fresh Git repository with files and configs to optimize version control for Jupyter Notebooks

Description

A common use of Jupyter Notebooks is for learning, taking notes, and having code examples that you can modify and run later. Other commons uses include proving out data analysis or machine learning which generates a lot of output in the form of images and data. In both of these scenarios, the output changes frequently and it not as important as the notebook configuration. The output can easily be regenerated for many use cases.

The output generated by notebooks is a great candidate for to be ignored in a Git repoitory so commits are minimal and point to meaningful code and not data that is derived from that code.

There are a number of methods one can take to version control your Jupyter Notebooks and ignore the output.

One of the best ( documented here ) is to utilize a Git filter to target *.ipynb files and strip out the output field in the json before it gets staged.

This approach requires a few steps that you may not be interested in or may want to have to deal with when setting up a new repo for Jupyter Notebooks so this CLI command can be used to create and initialize a Git repository with configs already in place. Simply startup your Jupyter Notebooks and commit when you hit a meaningful checkpoint.

Installation

Install the CLI

pip install create-jupyter-git

Usage

Run the CLI and specify in the path to where you want your NEW Git repository created

create-jupyter-git <new repository path>

This repository will have a .gitignore to ensure checkpoints aren't versioned. It also creates a .gitattributes with a configuration for filtering and then adds .git/config values to utilize the Python scripts that handle the filtering via git filter clean.

Start Jupyter

cd <new repository path>
jupyter lab notebooks

Start Jupyter with .venv

This setup is great for pulling in dependencies just for your Notebooks that don't clutter your global or personal python library space.

Setup your .venv and allow your global or user Jupyter install to be utilized.

cd <new repository path>
python3 -m venv .venv --system-site-packages

Activate the .venv:

source .venv/bin/activate

Add your .venv as a Juypyter kernel

python -m ipykernel install --user --name=.venv

Start the Jupyter Lab

jupyter lab notebooks

Commit Your Changes

You can create directories, notebooks, and fill your notebooks with wonderful code and generate beautiful output. When you are at a meaningful spot in your development simply do a git commit. The Git configurations that are inplace will filter out all output within your notebook files and stage them.

If you push up to a remote repository like GitHub, you will see that the output fields in your notebooks are empty! Great!

You will also notice GitHub does some cool magic to re-generate the output in a preview format for you when you view a *.ipynb file. So you can still see the output in GitHub without storing it in your source. Neat!

Development

Publishing

First bump the version

bumpversion --current-version x.x.x <major | minor | patch> setup.py create_jupyter_git/__init__.py

Next generate the distribution files

python setup.py sdist bdist_wheel

Validate the package

twine check dist/*

Upload the package for publication

twine upload dist/*