/dask-tutorial

NCAR Dask tutorial (Spring 2023)

Primary LanguageJupyter NotebookCreative Commons Zero v1.0 UniversalCC0-1.0

NCAR CISL NSF Logo

NCAR Dask Tutorial

Jupyter Build Made withJupyter Commits

Welcome to NCAR Dask Tutorial!

Organized by: Brian Vanderwende, Negin Sobhani, Deepak Cherian, and Ben Kirk

The materials and notebooks in this tutorial is published as a Jupyter book here. Jupyter Book Badge

Here you will find the tutorial materials from the CISL/CSG Dask Tutorial. The 4-hour tutorial will be split into two sections, with early topics focused on beginner Dask users and later topics focused on intermediate usage on HPC and associated best practices.

This tutorial is open to non-UCAR staff. If you don't have access to the HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!

Video Recoding: Will be available after the event

Course Outline

  1. Dask Overview
  2. Dask Data Arrays
  3. Dask DataFrames
  4. Dask + Xarray
  5. Dask Schedulers
  6. Dask on HPC Systems
  7. Dask Best Practices

Prerequisites

Before beginning any of the tutorials, it is highly recommended that you have a basic understanding of Python programming and Python libraries such as NumPy, pandas, and Xarray.

⌨️ Getting set up

This tutorial is open to non-UCAR staff. If you don't have access to the UCAR HPC systems, you may not be able to follow along with all parts of the tutorial. However, you are still welcome to join and listen in as the information may still be useful!

This is the preferred way to interact with this tutorial. Users with access to Casper can run the notebooks interactively, and will be able to save their work and pull in new updates. To connect to NCAR JupyterHub, please open this link in a web browser: https://jupyterhub.hpc.ucar.edu/

Next, clone the repository to your local directory:

git clone https://github.com/NCAR/dask-tutorial

Finally, open the notebooks and interact with them. Make sure to choose the "NPL 2023a" kernel.

Local installation instructions

Users without access to the NCAR/UCAR Casper cluster can only run through the first few notebooks. To run the notebooks locally:

First clone this repository to your local machine via:

git clone https://github.com/NCAR/dask-tutorial

Next, download conda (if you haven't already)

If you do not already have the conda package manager installed, please follow the instructions here.

Now, create a conda environment:

Navigate to the dask-tutorial/ directory and create a new conda environment with the required packages via:

cd dask-tutorial
conda env update --file environment.yml

This will create a new conda environment named "dask-tutorial".

Next, activate the environment:

conda activate dask-tutorial

Finally, launch JupyterLab with:

jupyter lab

Contributing

We welcome contributions from the community! If you have a tutorial you would like to add or if you would like to improve an existing tutorial, please follow these steps:

Fork the repository.

Clone the repository to your local machine:

git clone https://github.com/your-username/dask-tutorial-repository.git

Create a new branch for your changes:

git checkout -b my-new-tutorial

Make your changes and commit them:

git add .
git commit -m "Add my new tutorial"

Push your changes to your fork:

git push origin my-new-tutorial

Submit a pull request to the original repository.

Support

If you have any questions or need help with the tutorials, please open a GitHub issue in the repository.

👍 Acknowledgments

  • NCAR CISL/CSG Team
  • ESDS Initiative

License

The tutorials in this repository are released under the MIT License.