rikturr/getting-up-to-speed-with-dask

Gentle introduction to Dask

Jupyter NotebookMIT

Getting up to speed with Dask

Dask is a native Python library for parallel computing. This tutorial shows how you can scale data science from a laptop to a cluster using Dask.

Notebooks

Create the conda environment and launch Jupyter Lab (or notebook).

conda env create -f environment.yml
conda activate dask-speed
jupyter lab

Get data: Pulls files from S3
Laptop: Run analysis and train models using non-parallel Python packages. Try to load larger data, then run out of memory.
Dask laptop: Same analysis with larger data using Dask, still on laptop. Slow, but executes.
Dask cluster: Run analysis with a Dask cluster, super fast!

Saturn Cloud

To run in Saturn Cloud, create a new Project with the following settings:

Then launch Jupyter, open a new terminal window and clone the repo:

git clone https://github.com/rikturr/getting-up-to-speed-with-dask

Presentations

April 8, 20201 - ChiPy meetup

Slides

September 11, 2020 - Deep learning adventures meetup

Slides

July 15, 2020 - ODSC Applied AI virtual event

Slides
Video