Add more options to run interactively on the cloud
scottyhq opened this issue ยท 21 comments
With google dropping credits for mybinder.org recently i've noticed launching sessions are indeed more unreliable
https://blog.jupyter.org/mybinder-org-reducing-capacity-c93ccfc6413f
It would be good to document running this content on other "free" platforms such as:
Another option is a pyscript / thebe thing potentially? I don't know what the state of affairs is here.
I just discovered this the other day that might be promising for this: https://jupyterlite.readthedocs.io/en/latest/ it uses pyodide and can run jupyterlab in the browser. It does uses the users local computing resources like a regular web app, so technically it's not fully "cloud".
The tutorial content is mostly local datasets downloaded using pooch
or synthetic datasets, so that would be totally fine.
Did a quick test with google colab (which I admittedly haven't used much). It's not really well setup for a directory of notebooks as far as I can tell, nor conda environments! The default runtime has the following versions pre-installed:
INSTALLED VERSIONS
------------------
commit: None
python: 3.10.11 (main, Apr 5 2023, 14:15:10) [GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.15.107+
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.2
libnetcdf: None
xarray: 2022.12.0
pandas: 1.5.3
numpy: 1.22.4
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: 1.1.0
h5py: 3.8.0
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
rasterio: None
cfgrib: None
iris: None
bottleneck: None
dask: 2022.12.1
distributed: 2022.12.1
matplotlib: 3.7.1
cartopy: None
seaborn: 0.12.2
numbagg: None
fsspec: 2023.4.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 67.7.2
pip: 23.1.2
conda: None
pytest: 7.2.2
mypy: None
IPython: 7.34.0
sphinx: 3.5.4
So many of the notebooks could be executed, but not all. A simple pip install zarr flox
would work if you only need a few libraries. Installing a full-fledged conda environment is slow and cumbersome and per-notebook:
!pip install -q condacolab
import condacolab
condacolab.install()
import condacolab
condacolab.check()
# NOTE: this will take a while, be patient!
!mamba env update --quiet --name="base" --file="https://raw.githubusercontent.com/xarray-contrib/xarray-tutorial/main/conda/environment.yml"
Adding CI to publish a Docker Image to GHCR would be nice to facilitate running locally for people who like Docker and also running on GitHub Codespaces
AWS StudioLab is more straightforward because you have a full-fledged normal JupyterLab interface (file browser, multiple notebooks, a terminal). You still have to install the locked environment as a manual step, as the standard environment does not come with xarray. A bonus of using StudioLab compared to BinderHub is that content and environments persists across sessions.
Note link syntax similar to binderhub above https://studiolab.sagemaker.aws/import/github/xarray-contrib/xarray-tutorial/blob/main/overview/fundamental-path/index.ipynb
mamba env create --name="xarray-tutorial" --file="https://raw.githubusercontent.com/xarray-contrib/xarray-tutorial/main/conda/environment.yml"
Do you know if we need to pay for AWS StudioLab? It's asking me to login.
Do you know if we need to pay for AWS StudioLab? It's asking me to login.
You do have to create an account unlike Binder & Colab, but it is free without any credit card required. They impose daily usage limits (I think 12 hour sessions). We'll want to check resource limits and make sure the notebooks all actually run
Gotcha sounds good. Posting link to their FAQ here: https://studiolab.sagemaker.aws/faq. There's a waitlist to make new account? At least that's what their FAQ said.
Oh didn't realize that!
Q: Why is there a waiting list to get an account?
We are limiting the number of new account registrations at this time to ensure a high quality of experience for all users.
Q: How long do I have to wait for my account request to get approved?
Account requests are typically approved within 1 to 5 business days.
That's definitely a deal-breaker for large tutorials where we likely won't be able to engage with participants beforehand to sign up. Will be good to know if you do get access in 1-2 days @lsetiawan !
Update: I was able to be approved in 2 minutes and setting up the account took about 5 minutes. Though right now it's not straight forward on how to spin up the index notebook with the supplied conda environment... will have to investigate that more. I think this is a potential great way to run the tutorial. If we can get access to the people attending the tutorials, there can be some time to notify the participants to get AWS StudioLab account.
Update 2: Looks like it's not very straight forward to open up the index.ipynb
. Going to https://studiolab.sagemaker.aws/import/github/xarray-contrib/xarray-tutorial/blob/main/overview/fundamental-path/index.ipynb doesn't automatically clone the repo and spin up the environment. There are a lot of steps that need to be done, including cloning the entire repo, creating a custom environment from the conda env yaml file (like the instruction in https://github.com/aws/studio-lab-examples/blob/main/custom-environments/custom_environment.ipynb)... it doesn't have mamba
so creating environment takes forever, and then navigating to the index.ipynb
and opening that up. I feel like this is a lot of steps and I'm spoiled by my binder, but what do you think @scottyhq?
Thanks for looking into it @lsetiawan ! Agreed that studiolab is a bit tricky. In the end we'll have a couple options with some pros and cons that we can document on one of the website pages. I think near-term we should try out jupyterlite and codespaces too.
Great stuff! Eventually, it would be good to summarize your learnings on the pro/cons of each option here: https://tutorial.xarray.dev/overview/get-started.html
Linking the comment from @dcherian here: #170 (comment).
Currently Quansight is offering to host Nebari for tutorial and I think we should definitely take them up on that as Nebari is a really great system for this kinds of things IMO. I'll fill out the form for this. Looks like I need a few specs questions answer help.
-
I think these 2 options are enough for the tutorial (these are the default machines they're offering)
Small (2 CPUs, 8 GB RAM)
Medium (4 CPUs, 16 GB RAM) -
I assume we don't need a GPU instance, it doesn't look like any of the tutorials uses that.
@scottyhq Could you confirm the above? Thanks!
I'll fill out the form for this.
Thanks!
I think yes on (1), (2). We could optionally use GPUs but it isn't necessary.
we should manage with "small", but let's go ahead and request medium since some of the content will focus on dask and having a bit more than typically available on binder systems would be nice :) No GPUs necessary.
I asked what the dask team was planning to do and got the following responses from Naty Clementi and Jacob Tomlinson:
- Naty: We were planning on running mostly local, but we talked about the chance of using coiled notebooks +
jupyter-repo2docker
to get everything on the image. https://blog.coiled.io/blog/coiled-notebooks.html - Jacob: When I run RAPIDS tutorials I usually stand up my own Binder because I need to add GPUs to the nodes, it's pretty quick and easy to do, especially if you're just running vanilla Binder without the GPU stuff.
Quansight have hosted a nebari instance for the workshop, which can be found at https://scipy.quansight.dev/