/getting-up-to-speed-with-dask

Gentle introduction to Dask

Primary LanguageJupyter NotebookMIT LicenseMIT

Getting up to speed with Dask

dask

Dask is a native Python library for parallel computing. This tutorial shows how you can scale data science from a laptop to a cluster using Dask.

Notebooks

Create the conda environment and launch Jupyter Lab (or notebook).

conda env create -f environment.yml
conda activate dask-speed
jupyter lab
  1. Get data: Pulls files from S3
  2. Laptop: Run analysis and train models using non-parallel Python packages. Try to load larger data, then run out of memory.
  3. Dask laptop: Same analysis with larger data using Dask, still on laptop. Slow, but executes.
  4. Dask cluster: Run analysis with a Dask cluster, super fast!

Saturn Cloud

To run in Saturn Cloud, create a new Project with the following settings:

saturn

Then launch Jupyter, open a new terminal window and clone the repo:

git clone https://github.com/rikturr/getting-up-to-speed-with-dask

Presentations

April 8, 20201 - ChiPy meetup

September 11, 2020 - Deep learning adventures meetup

July 15, 2020 - ODSC Applied AI virtual event

Video