/syclacademy

SYCL Academy, a set of learning materials for SYCL heterogeneous programming

Primary LanguageHTMLCreative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

SYCL Academy

SYCL Academy SYCL Academy

This repository provides materials that can be used for teaching SYCL. The materials are provided using the "Creative Commons Attribution Share Alike 4.0 International" license.

What is SYCL?

If you're not familiar with SYCL or would like some further resources for learning about SYCL below are a list of useful resources:

How to use the Materials

To use these materials simply clone this repository including the required submodules.

git clone --recursive https://github.com/codeplaysoftware/syclacademy.git

The lectures are written in reveal.js, and can be found in Lesson_Materials, in the sub-directory for each topic. To view them simply open the index.html file in your browser. Your browser will have a "Full Screen" mode that can be used to run the presentation, use the right and left cursors to move forward and backward in the presentation.

The exercises can be found in Code_Exercises in the sub-directory for each topic. Each exercise has a markdown document instructing what to do in the exercise, a source file to start with and a solution file to provide an example implementation to compare against.

Contributing to SYCL Academy

Contributions to the materials are very gratefully received and this can be done by submitting a Pull Request with any changes. If you can, follow the instructions here to generate a pdf file for any lecture slides you change. Please limit the scope of each Pull Request so that they can be reviewed and merged in a timely manner.

List of Contributors

Codeplay Software Ltd., Heidelberg University, Intel, Xilinx and University of Bristol.

Supporting Organizations

Abertay University, Universidad de Concepcion, TU Dresden, University of Edinburgh, Federal University of Sao Carlos, University of Glasgow, Heriot Watt University, Universitat Innsbruck, Universidad de Málaga, University of Salerno and University of the West of Scotland.

Lesson Curriculum

The SYCL Academy curriculum is divided up into a number of short lessons consisting of slides for presenting the material and a more detailed write-up, each accompanied by a tutorial for getting hands on experience with the subject matter.

Each of the lessons are designed to be self contained modules in order to support both academic and training style teaching environments.

A playlist of video content is also available. Though note that these slides and exercises may have changed since these videos were created so they may not match completely.

Lesson Title Slides Exercise Source Solution DPC++ AdaptiveCpp
01 What is SYCL slides exercise source solution Yes Yes
02 Enqueueing a Kernel slides exercise source solution Yes Yes
03 Managing Data slides exercise source solution Yes Yes
04 Handling Errors slides exercise source solution Yes Yes
05 Device Discovery slides exercise source solution Yes Yes
06 Data Parallelism slides exercise source solution Yes Yes
07 Introduction to USM slides exercise source solution Yes Yes
08 Using USM slides exercise source solution Yes Yes
09 Asynchronous Execution slides exercise source solution Yes Yes
10 Data and Dependencies slides exercise source solution Yes Yes
11 In Order Queue slides exercise source solution Yes Yes
12 Advanced Data Flow slides exercise source solution Yes Yes
13 Multiple Devices slides exercise source solution Yes Yes
14 ND Range Kernels slides exercise source solution Yes Yes
15 Image Convolution slides exercise solution Yes Yes
16 Coalesced Global Memory slides exercise source solution Yes Yes
17 Vectors slides exercise source solution Yes Yes
18 Local Memory Tiling slides exercise source solution Yes Yes
19 Further Optimisations slides exercise source solution Yes Yes
20 Matrix Transpose slides exercise source solution Yes Yes
21 More SYCL Features slides exercise source solution Yes Yes
22 Functors slides exercise source solution Yes Yes

Building the Exercises

The exercises can be built for DPC++ and AdaptiveCpp.

Supported Platforms

Below is a list of the supported platforms and devices for each SYCL implementations, please check this before deciding which SYCL implementation to use. Make sure to also install the specified version to ensure that you can build all of the exercises.

Implementation Supported Platforms Supported Devices Required Version
DPC++ Intel DevCloud
Windows 10 Visual Studio 2019 (64bit)
Red Hat Enterprise Linux 8, CentOS 8
Ubtuntu 18.04 LTS, 20.04 LTS (64bit)
Refer to System Requirements for more details
Intel CPU (OpenCL)
Intel GPU (OpenCL)
Intel FPGA (OpenCL)
Nvidia GPU (CUDA)*
2021.4
AdaptiveCpp Any Linux CPU (OpenMP)
AMD GPU (ROCm)***
NVIDIA GPU (CUDA)
Intel GPU (Level Zero)
Intel CPU, GPU (OpenCL)
23.10.0 from Nov 1, 2023 or newer

* Supported in open source project only

** See here for the official list of GPUs supported by AMD for ROCm. We do not recommend using GPUs earlier than gfx9 (Vega 10 and Vega 20 chips).

Install SYCL implementations

First you'll need to install your chosen SYCL implementation and any dependencies they require.

Installing DPC++

To set up DPC++ follow the getting started instructions.

You can also use a Docker* image.

If you are using the Intel DevCloud then the latest version of DPC++ will already be installed and available in the path.

Installing AdaptiveCpp

You will need a AdaptiveCpp (formerly hipSYCL) build from September 2021 or newer. Refer to the AdaptiveCpp installation instructions for details on how to install AdaptiveCpp.

Pre-requisites

Before building the exercises you'll need:

  • One of the platforms in the support matrix above, depending on which SYCL implementation you are wishing to build for.
  • A C++17 or above tool-chain.
  • An appropriate build system for the platform you are targeting (CMake, Ninja, Make, Visual Studio).

Configuring using CMake

Clone this repository, there are some additional dependencies configured as git sub-modules so make sure to clone those as well. Then simply invoke CMake as follows:

mkdir build

cd build

cmake ../ -G<cmake_generator> -A<cmake_arch> -D<sycl_implementation>=ON

For <cmake_generator> / <cmake_arch> we recommend:

  • Visual Studio 16 2019 / x64 (Windows)
  • Ninja / NA (Windows or Linux)
  • Make / NA (Linux) i.e. "-GUnix Makefiles"

For sycl_implementation this can be one of:

  • SYCL_ACADEMY_USE_ADAPTIVECPP
  • SYCL_ACADEMY_USE_DPCPP

You can also specify the additional optional options:

-DSYCL_ACADEMY_INSTALL_ROOT=<path_to_sycl_impl_install_root>

For <path_to_sycl_impl_install_root> we recommend you specify the path to the root directory of your SYCL implementation installation, though this may not always be required.

-DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON

This will enable building the solutions for each exercise as well as the source files. This is disabled by default.

-DCMAKE_BUILD_TYPE=Release

The build configuration for all exercises defaults to a debug build if this option is not specified.

Additional cmake arguments for DPC++

-DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

This SYCL Academy CMake configuration uses the Intel oneAPI IntelSYCL CMake module package to assist it in its configuration. These command line arguments must be used to initiate this configuration correctly.

-DSYCL_TRIPLE can be used to specify a DPC++ compatible SYCL triple. Possible values include:

  • amdgcn-amd-amdhsa - For AMD devices
  • nvptx64-nvidia-cuda - For CUDA devices
  • spir64_gen - For Intel GPUs
  • native_cpu - For native CPU SYCL device (dependent on DPCPP version)

-DSYCL_ARCH can also be used to specify a device arch. This CMake opt is necessary for AMD. Possible values include:

  • gfx90a - For AMD MI200
  • sm_80 - For NVIDIA A100
  • pvc - For Intel PVC

It may also be necessary to manually specify the install location of the CUDA or ROCM SDK, if this is found in a non-standard location. The flags:

-DROCM_DIR and -DCUDA_DIR can be used to specify the install dir of the ROCM or CUDA SDKs, respectively.

DPC++ for AMD CMake example
  cmake .. -GNinja -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx _DSYCL_TRIPLE=amdgcn-amd-amdhsa -DSYCL_ARCH=gfx90a -DROCM_DIR=/opt/rocm/5.4.3
DPC++ for CUDA CMake example
  cmake .. -GNinja -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx _DSYCL_TRIPLE=nvptx64-nvidia-cuda -DSYCL_ARCH=sm_61 -DCUDA_DIR=/usr/local/cuda/11.2/

Additional cmake arguments for AdaptiveCpp

Sufficiently new (>= 24.02.0), full installations of AdaptiveCpp do not require specifying compilation targets. In this case, targets may still be provided optionally.

For older AdaptiveCpp versions, CMake will require you to specify the compilation targets using -DACPP_TARGETS=<target specification>.

<target specification> is a list of compilation flows to enable and devices to target, for example -DACPP_TARGETS="omp;generic" compiles for CPUs using OpenMP and GPUs using the generic single-pass compiler.

If your AdaptiveCpp installation does not force a compilation target to be provided, but it was built with the generic single-pass compiler disabled (it is enabled by default in all AdaptiveCpp installations built against LLVM >= 14), it is compiling for a default set of targets provided at installation time. If you cannot run the binary on the hardware of your choice, this default set may not be the right one for your hardware and you may have to specify the right targets explicitly.

Available compilation flows are:

  • omp - OpenMP CPU backend
  • generic - Generic single-pass compiler. Generates a binary that runs on host CPU, AMD, NVIDIA and Intel GPUs using runtime compilation
  • cuda - CUDA backend for NVIDIA GPUs. Requires specification of targets of the form sm_XY, e.g. sm_70 for Volta, sm_60 for Pascal. E.g: cuda:sm_70.
  • hip - HIP backend for AMD GPUs. Requires specification of targets of the form gfxXYZ, e.g. gfx906 for Vega 20, gfx900 for Vega 10. E.g.: hip:gfx906.

When in doubt, use -DACPP_TARGETS=generic as it compiles the fastest, usually generates the fastest binaries, and generates portable binaries.

CMake usage example

Invoking CMake from the command line example usage:

  cmake .. "-GUnix Makefiles" -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx

Getting started with compiling DPC++

First you have to ensure that your environment is configured to use DPC++ (note if you are using the Intel DevCloud then you don't need to do this step).

On Linux simply call the setvars.sh which is available in /opt/intel/oneapi for sudo or root users and ~/intel/oneapi/ when installed as a normal user.

For root or sudo installations: source /opt/intel/oneapi/setvars.sh

For normal user installations: source ~/intel/oneapi/setvars.sh

On Windows the script is located in <dpc++_install_root>\setvars.bat

Where <dpc++_install_root> is wherever the oneAPI directory is installed.

Once that's done you can invoke the DPC++ compiler as follows:

icpx -fsycl -o a.out source.cpp

Where <syclacademy_root> is the path to the root directory of where you cloned this repository. Note that on Windows you need to add the option /EHsc to avoid exception handling error.

The CMake configuration can also be used to build the exercises, see the section Configuring using CMake above.

Working on the Exercises

Once you have a working SYCL compiler, you are ready to start writing some SYCL code. To find the first exercise:

cd Code_Exercises/Compiling_with_SYCL/

And read the README.md for further instructions.

Each exercise directory contains:

  • README.md, which contains instructions of how to complete a given exercise, as well as directions for compilation.
  • source.cpp, a placeholder file where your code implementation should be written.
  • solution.cpp, where a solution has been implemented in advance.

Once you have completed any given exercise make sure to compare your implementation against the corresponding solution.cpp.

Online Interactive Tutorial

Hosted by tech.io, this SYCL Introduction tutorial introduces the concepts of SYCL. The website also provides the ability to compile and execute SYCL code from your web browser.

Connecting to DevCloud via SSH

  • Start by creating an Intel DevCloud account account if you do not already have one and login in.
  • Initialize the SSH configuration by clicking on Automated Configuration and follow the instructions to setup the SSH configuration file.
  • SSH into DevCloud (ssh devcloud)

Connect to DevCloud via Jupyter Notebooks

  • Start by creating an Intel DevCloud account account if you do not already have one and login in.
  • Go to training and click on ```Launch JupyterLab´´´
  • In the Jupiter Notebook select File->New->Terminal

You are now ready to start with the first lesson. Enjoy !

Building the Exercises for DPC++

  • Execute the following command to download SYCLAcademy:
git clone --recursive https://github.com/codeplaysoftware/syclacademy.git
  • If you are using DevCloud via ssh, run:
module load cmake
  • To create the code_exercises directory structure with the Makefiles:
cd syclacademy
mkdir build
cd build
cmake ../ "-GUnix Makefiles" -DSYCL_ACADEMY_USE_DPCPP=ON -DSYCL_ACADEMY_ENABLE_SOLUTIONS=ON -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx