ur
Import remote Python files and Jupyter notebooks, from GitHub Gists, the local filesystem, or arbitrary URLs:
Import From | ||||||
---|---|---|---|---|---|---|
Gists | URLs | Local files | GitHub | GitLab | ||
File Type | Notebook (.ipynb) | ✅ | ✅ | ✅ | ✅ | ✅ |
Python (.py) | ✅ | ✅ | ✅ | ✅ | ✅ |
Install:
In a shell:
pip install ur
In a Jupyter notebook:
from sys import executable as python
!{python} -m pip install ur
Usage
Import GitHub Gists
Import several notebooks and Python files from a Gist:
from gist._1288bff2f9e05394a94312010da267bb import *
a_b.a(), a_b.b(), c.c()
Cloning https://gist.github.com/1288bff2f9e05394a94312010da267bb.git into .objs/Gist/1288bff2f9e05394a94312010da267bb/clone
('aaa', 'bbb', 'ccc')
(note the leading underscore in the import
statement, which is necessary when the Gist ID begins with a number)
Import from GitHub, GitLab Repos
Import some helper functions (in this case, various wrappers around subprocess
functions) directly from a notebook on GitHub:
from github.ryan_williams.jupyter_rc.process import *
lines('echo','yay') # wrapper around subprocess.check_output that asserts and returns a single line written to stdout
Cloning https://github.com/ryan-williams/jupyter-rc.git into .objs/Github/ryan-williams/jupyter-rc/clone
Running: echo yay
['yay']
Importing from GitLab also works; here we import a cd
("change directory") contextmanager:
from gitlab.runsascoded.dotfiles.jupyter.cd import cd
from pathlib import Path
with cd('examples'): print(Path.cwd())
Cloning https://gitlab.com/runsascoded/dotfiles/jupyter.git into .objs/Gitlab/runsascoded/dotfiles/jupyter/clone
/github/workspace/examples
Import arbitrary URLs
The ur
module exposes a powerful API for importing code from {local,remote} {.py
,.ipynb
} files.
Here is an example directly importing one of the files in the example gist used above:
import ur
a_b = ur('https://gist.githubusercontent.com/ryan-williams/1288bff2f9e05394a94312010da267bb/raw/a_b.ipynb')
a_b.a(), a_b.b()
('aaa', 'bbb')
Import wildcards
In addition to calling the ur
module (and returning a module), the *
operator can be used:
import ur
ur * 'https://gist.github.com/1288bff2f9e05394a94312010da267bb'
a_b.a(), a_b.b(), c.c()
('aaa', 'bbb', 'ccc')
This is is analogous to import *
syntax, but can be used to import from arbitrary URLs (in this case, ur
detects that the URL represents a gist, and imports the two .ipynb
modules found there).
Here is an equivalent import using the ur(…)
syntax:
import ur
ur(gist='1288bff2f9e05394a94312010da267bb', all='*')
a_b.a(), a_b.b(), c.c()
('aaa', 'bbb', 'ccc')
import ur
url = 'https://gist.githubusercontent.com/ryan-williams/1288bff2f9e05394a94312010da267bb/raw/0a2b5966c22c5461734063b78239262e39e4f363/c.ipynb'
ur(url, all=True)
c()
'ccc'
ur.opts
Configuration: Various behaviors can be configured via the ur.opts
object:
only_defs
Default: True
Only bring certain top-level entities (functions, modules, and imported symbols; see CellDeleter
) into scope from the imported module.
verbose
Default: False
Eenable verbose logging during import magic
skip_cache
Default: False
When set, pull latest versions of imported modules (instead of reusing Git clones cached by previous runs).
cache_root
Default .objs
Remote imported modules are cloned and cached here, namespaced by their type and a primary key.
For example, the example Gist referenced above will persist information in a directory called .objs/Gist/1288bff2f9e05394a94312010da267bb
.
cache-root-example.ipynb
shows how to set a non-default cache_root
, and what the cache dir's contents look like.
encoding
Default: utf-8
Encoding to decode remote notebooks with.
Discussion
Jupyter notebooks provide a rich, literate programming experience that is preferable to conventional IDE-based Python environments in many ways.
However, conventional wisdom is that reusing code in notebooks requires porting it to .py
files. This is tedious and often requires trashing some of what makes notebooks great in the first place (rich, inline documentation, easy reproducibility, etc.).
Jupyter itself provides sample code for importing code from (local) Jupyter notebooks (originally from 2014?), and several packages, repositories, blog posts, and issues have built on and published similar code.
nbimporter (which this repo is a fork of) is perhaps most notable, allowing seamless reuse of Jupyter-resident utilities within single projects, but its author now recommends factoring code out to .py
modules.
The Jupyter ecosystem increasingly shines for the ease with which it allows of publishing and sharing notebooks, and stands to gain a lot from easier remixing+reuse of the wealth of code and data being published in Jupyter notebooks every day. I believe there are straightforward answers to the reproducibility and testability concerns raised in nbimporter
, and built the ur
package to bear that out (and solve immediate productivity and code-reuse needs in my day-to-day life).
Remote importing: package-less publishing
An animating idea of ur
is that publishing+reusing a useful helper function should be no harder than
Reuse of code in Jupyter notebooks should be made as easy as possible. Users shouldn't have to mangle their utility-code and then publish it to Pip repositories in order to avoid copy/pasting standard helpers in every notebook/project they work in.
Importing code directly from remote Notebooks (or .py
files) allows frictionless code reuse beyond what Python/Jupyter users are offered today.
GitHub Gists: "anyone with the link can view" git repositories
ur
particularly emphasizes using and importing from GitHub Gists. Like git
itself, Gists combine a few simple but powerful concepts orthogonally, forming a great platform for sharing and tracking code:
- Gists are the only service I'm aware of that allows "publishing" a Git repository to an opaque URL that can be easily shared, but is otherwise (cryptographically, I think?) private, not search-indexed, etc.
- GitLab snippets are a comparable product, but a request for this feature is open at time of writing
- Each Gist is backed by a full Git repository, under the hood
- Gists can therefore track changes to many files over time
- Users can choose to view Gists' "latest
HEAD
" content (the default on https://gist.github.com), or specify frozen Git-SHA permalinks for guaranteed reproducibility- both "modes" are supported via web browser, Git CLI, or GitHub API
- Many CLIs and SDKs exist for interacting with Gists from different languages/environments
- I previously wrote a
gist-dir
helper that uploads an entire directory as a Gist (also working around issues with binary data that the Gist API normally doesn't handle correctly)
- I previously wrote a
Use-case: portable, shareable "dotfiles"
Something that ur
makes easy is boilerplate-free reuse of common imports and aliases across notebooks/projects/users.
For example, "everyone" imports numpy as np
, pandas as pd
, plotly as pl
, etc. I have a few that I like in addition: from os import environ as env
, from sys import python as executable
, etc.
ur
offers several minimal-boilerplate ways to let you (and anyone you share your notebook with) use all the helpers you like, portably, without having to redeclare them or otherwise interfere with the environment you originally used them in:
import ur
ur(github='ryan-williams/dotfiles', tree='v1.0', file='dotfiles.ipynb', all='*')
Many versions of this can be used, depending on your preferences, e.g.:
from gist.abcdef0123456789abcdef0123456789 import *
Future work
Customize import behavior
- test/handle intra-gist imports
- test/handle pip dependencies in gist imports
- API for tagging/skipping cells in notebooks (visualizations, tests, etc.)
- Context manager for controlling ancestor imports
- Skip importing notebooks with Papermill
parameters
tags - work with
importlib.reload
- support
__init__.ipynb
(automatically load notebook when loading Gist),__all__
(configureimport *
behavior) - more nuanced TTL /
skip_cache
behavior (e.g. let cached URLs time-out appropriately based on HTTP headers, a larequests-cache
) -
setup.py
"extras" to allow forpip
-installing only specific pieces (e.g. exclude gists/github/gitlab?) - use bare Git clones
- vet
.urignore
logic (I don't think it tracks which patterns are ignored for which directories, atm; all patterns end up glommed together and applied everywhere 🙀)
Usability
- pretty-print info about what's imported (in notebook environments)
- proper logging:
- support
dict
foropts.verbose
- colorized / rich log rendering (incl. HTML in notebook environments)
- support
Speed
- do some benchmarking (WIP: benchmark.ipynb)
- read from
__pycache__
, when present, instead of compiling
Import Sources
- support github / gitlab imports
- Support
nbformat
/Jupyter versions >4
Import from specific refs within Gists/repos:
- commit SHAs (e.g.
gist._1288bff2f9e05394a94312010da267bb._8d7c134f5ef7bd340fd52840006d37bdd52515a5
) - short commit SHAs (e.g.
gist._1288bff2f9e05394a94312010da267bb._8d7c134
) - branch names (e.g.
gist._1288bff2f9e05394a94312010da267bb.master
) - tags (e.g.
gist._1288bff2f9e05394a94312010da267bb.v1.0
)
Project Management
- Minimize(+freeze!) dependencies
- Self-hosting:
- put code in notebooks
- mirror repository in a Gist
- implement subsequent versions of
ur
using earlier versions ofur
(importing from remote, package-less locations)
- run
*-test.ipynb
notebooks as tests - set up CI
- generate
README.md
fromREADME.ipynb
with pre-commit hook - convert/copy all of these TODOs GitHub into issues!