/nvidia-cuda-tutorial

Nvidia contributed CUDA tutorial for Numba

Primary LanguageJupyter Notebook

Numba for CUDA Programmers

Author: Graham Markall, NVIDIA gmarkall@nvidia.com.

What is this course?

This is an adapted version of one delivered internally at NVIDIA - its primary audience is those who are familiar with CUDA C/C++ programming, but perhaps less so with Python and its ecosystem. That said, it should be useful to those familiar with the Python and PyData ecosystem.

It focuses on using CUDA concepts in Python, rather than going over basic CUDA concepts - those unfamiliar with CUDA may want to build a base understanding by working through Mark Harris's An Even Easier Introduction to CUDA blog post, and briefly reading through the CUDA Programming Guide Chapters 1 and 2 (Introduction and Programming Model). Other concepts discussed in the course (such as shared memory) are discussed in later chapters. For expediency, it is recommended to look up concepts in those sections when necessary, rather than reading all the reference material in detail.

What is in this course?

The course is broken into 5 sessions, designed for a session to be presented then the examples and exercises worked through before participants move to the next session. This could be presented at a cadence of one session per week with an hour of presentation time to fit the course around other tasks. Alternatively it could be delivered as a tutorial session over the course of 2-3 days.

Session 1: An introduction to Numba and CUDA Python

Session 1 files are in the session-1 folder. Contents:

  • Presentation: The presentation for this session, along with notes.
  • Mandelbrot example: See the README for exercises.
  • CUDA Kernels notebook: In the exercises folder. Open the notebook using Jupyter.
  • UFuncs notebooks In the exercises folder. Open the notebooks using Jupyter. Contains two notebooks on vectorize and guvectorize on the CPU (as it's a little easier to experiment with them on the CPU target) and one notebook on CUDA ufuncs and memory management.

Session 2: Typing

Session 2 files are in the session-2 folder. Contents:

  • Presentation: The presentation for this session, along with notes.
  • Exercises: In the exercises folder. Open the notebook using Jupyter.

Session 3: Porting strategies, performance, interoperability, debugging

Session 3 files are in the session-3 folder. Contents:

  • Presentation: The presentation for this session, along with notes.
  • Exercises: In the exercises folder. Open the notebook using Jupyter.
  • Examples: In the examples folder. These are mostly executable versions of the examples given in the slides.

Session 4: Extending Numba

Session 4 files are in the session-4 folder. Contents:

  • Presentation: The presentation for this session, along with notes.
  • Exercises: In the exercises folder. Open the notebook using Jupyter. A solution to the exercise is also provided.
  • Examples: In the examples folder. This contains a notebook working through the Interval example presented in the slides.

Session 5: Memory Management

Session 5 files are in the session-5 folder. Contents:

  • Presentation: The presentation for this session, along with notes.
  • Exercises: In the exercises folder. Open the notebook using Jupyter.
  • Examples: In the examples folder. This contains examples of a simple EMM Plugin wrapping cudaMalloc, and an EMM Plugin for using the CuPy pool allocator with Numba.

Sources

Some of the material in this course is derived from various sources. These sources, are:

References

The following references can be useful for studying CUDA programming in general, and the intermediate languages used in the implementation of Numba: