A simple Python CLI tool to convert notebooks from Jupyter to Databricks Notebooks format.
-
Clone locally:
git clone https://github.com/rafaelvp-db/bryckster.git
-
Create a virtual environment and install requirements:
python -m venv .venv && source .venv/bin/activate && \
pip install --upgrade pip && pip install -r requirements.txt
- Open a terminal and run:
python bryckster.py run \
--input_path JUPYTER_NOTEBOOKS_FOLDER \
--output_path DATABRICKS_NOTEBOOKS_FOLDER
- This repo has been setup in a way that it showcases the following example:
- Jupyter notebooks are stored in the
jupyter
folder - A CICD pipeline looks into the
jupyter
folder and converts all of them into Databricks format, outputs them into thedatabricks
folder - As a last step, input
.ipynb
files are removed - To give it a try, you can fork this repo and add
ipynb
files into the Jupyter folder. Once you commit, Github Actions will run the steps above.
- Jupyter notebooks are stored in the
- It is possible to import Jupyter Notebooks directly into a Databricks Workspace and have them converted automatically.
- However, when using Repos integration the conversion does not work out of the box.
- You can add
bryckster
as a step in your CICD pipeline, so that whenever someone pushes an.ipynb
into your repo, it gets converted to the format expected by Databricks. - Apart from
bryckster
, it is also possible to import Jupyter Notebooks directly into a Databricks workspace by using the Workspace REST API or thedatabricks workspace
CLI. - One common caveat with Jupyter Notebooks is that since they store the output, they can also get pretty big. With
bryckster
you can simply get rid of those (if needed) prior to pushing into a Git Repo, for cleanliness sake.