/chipStar

chipStar is a tool for compiling and running HIP/CUDA on SPIR-V via OpenCL or Level Zero APIs.

Primary LanguageC++OtherNOASSERTION

chipStar

Unit Tests Intel GPUs Unit Tests ARM GPUs Docker Build and Publish

chipStar enables compiling and running HIP and CUDA applications on platforms which support SPIR-V as the device intermediate representation. It supports OpenCL and Level Zero as the low-level runtime alternatives.

chipStar was initially built by combining the prototyping work done in the (now obsolete) HIPCL and HIPLZ projects.

If you wish to cite chipStar in academic publications, please refer to the HIPCL poster abstract when discussing the OpenCL backend and/or the HIPLZ conference paper when mentioning the Level Zero backend. The core developers of chipStar are writing a proper article of the integrated chipStar project, but it is in progress.

The name chipStar comes from cuda and hip and the word Star which means asterisk, a typical shell wildcard, denoting the intention to make "CUDA and HIP applications run everywhere". The project was previously called CHIP-SPV.

Library Support

The following libraries have been ported to work on Intel GPUs via MKL:

  • hipBLAS (Can be built as a part of chipStar by adding -DCHIP_BUILD_HIPBLAS=ON)
  • hipFTT (Can be built as a part of chipStar by adding -DCHIP_BUILD_HIPFTT=ON)
  • hipSOLVER
  • hipCUB

The following libraries have been ported and should work on any platform:

If there is a library that you need that is not yet supported, please open an issue stating which libraries you require and what application you are trying to build.

Applications

chipStar has so far been tested using the following applications:

  • libCEED Our fork includes some workarounds.
  • GAMESS Source code is not public.
  • HeCBench CUDA Benchmarks.

Getting Started

Quickest way to get started is by using a prebuilt Docker container. Please refer to Docker README If you want to build everything yourself, you can follow a detailed Getting Started

Development Status and Maturity

While chipStar 1.1 can already be used to run various large HPC applications successfully, it is still heavily in development mode with plenty of known issues and unimplemented features. There are also known low-performance optimizations that are still to be done. However, we consider chipStar ready for wider-range testing and welcome community contributions in form of reproducible bug reports and good quality pull requests.

Release notes for 1.1, 1.0 and 0.9.

Prerequisites

  • Cmake >= 3.20.0
  • Clang and LLVM 17 (Clang/LLVM 15 and 16 might also work)
    • Can be installed, for example, by adding the LLVM's Debian/Ubuntu repository and installing packages 'clang-17 llvm-17 clang-tools-17'.
    • For the best results, install Clang/LLVM from a chipStar LLVM/Clang branch which has fixes that are not yet in the LLVM upstream project. See below for a scripted way to build and install the patched versions.
  • SPIRV-LLVM-Translator from a branch matching the LLVM major version: (e.g. llvm_release_170 for LLVM 17) , llvm-spirv.
    • Make sure the built llvm-spirv binary is installed into the same path as clang binary, otherwise clang might find and use a different llvm-spirv, leading to errors.

Compiling Clang, LLVM and SPIRV-LLVM-Translator

It's recommended to use the chipStar fork of LLVM which has a few patches not yet upstreamed. For this you can use a script included in the chipStar repository:

./scripts/configure_llvm.sh
Usage: ./configure_llvm.sh --version <version> --install-dir <dir> --link-type static(default)/dynamic --only-necessary-spirv-exts <on|off> --binutils-header-location <path>
--version: LLVM version 15, 16, 17, 18, 19
--install-dir: installation directory
--link-type: static or dynamic (default: static)
--only-necessary-spirv-exts: on or off (default: off)
--binutils-header-location: path to binutils header (default: empty)

./scripts/configure_llvm.sh --version 17 --install-dir /opt/install/llvm/17.0
cd llvm-project/llvm/build_17
make -j 16
<sudo> make install

Or you can do the steps manually:

git clone --depth 1 https://github.com/CHIP-SPV/llvm-project.git -b chipStar-llvm-17
cd llvm-project/llvm/projects
git clone --depth 1 https://github.com/CHIP-SPV/SPIRV-LLVM-Translator.git -b chipStar-llvm-17
cd ../..

# DLLVM_ENABLE_PROJECTS="clang;openmp" OpenMP is optional but many apps use it
# DLLVM_TARGETS_TO_BUILD Speed up compilation by building only the necessary CPU host target
# CMAKE_INSTALL_PREFIX Where to install LLVM

cmake -S llvm -B build \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLVM_ENABLE_PROJECTS="clang;openmp" \
  -DLLVM_TARGETS_TO_BUILD=X86 \
  -DCMAKE_INSTALL_PREFIX=$HOME/local/llvm-17
make -C build -j8 all install

OpenCL Backend

  • An OpenCL 2.0 or 3.0 driver with at least the following features supported:
    • Coarse-grained buffer shared virtual memory (SVM)
    • SPIR-V input
    • Generic address space
    • Program scope variables
  • Further OpenCL extensions or features might be needed depending on the compiled CUDA/HIP application. For example, to support warp-primitives, the OpenCL driver should support also additional subgroup features such as shuffles, ballots and cl_intel_required_subgroup_size.

Level Zero Backend

Downloading Sources

You can download and unpack the latest released source package or clone the development branch via git. We aim to keep the main development branch stable, but it might have stability issues during the development cycle.

To clone the sources from Github:

git clone https://github.com/CHIP-SPV/chipStar.git
cd chipStar
git submodule update --init --recursive

Building and Installing

mkdir build && cd build

# LLVM_CONFIG_BIN is optional if LLVM can be found in PATH or if not using a version-sufficed
# binary (for example, llvm-config-17)

cmake .. \
    -DLLVM_CONFIG_BIN=/path/to/llvm-config
    -DCMAKE_INSTALL_PREFIX=/path/to/install
make all build_tests install -j8

| You can also compile and install hipBLAS by adding -DCHIP_BUILD_HIPBLAS=ON

NOTE: If you don't have libOpenCL.so (for example from the ocl-icd-opencl-dev package), but only libOpenCL.so.1 installed, CMake fails to find it and disables the OpenCL backend. This issue describes a workaround.

Building on ARM + Mali

To build chipStar for use with an ARM Mali G52 GPU, use these steps:

  1. build LLVM and SPIRV-LLVM-Translator as described above

  2. build chipStar with -DCHIP_MALI_GPU_WORKAROUNDS=ON cmake option

There are some limitations - kernels using double type will not work, and kernels using subgroups may not work.

Note that chipStar relies on the proprietary OpenCL implementation provided by ARM. We have successfully managed to compile and run chipStar with an Odroid N2 device, using Ubuntu 22.04.2 LTS, with driver version OpenCL 3.0 v1.r40p0-01eac0.

Building on RISC-V + PowerVR

To build chipStar for use with a PowerVR GPU, the default steps can be followed. There is an automatic workaround applied for an issue in PowerVR's OpenCL implementation.

There are some limitations: kernels using double type will not work, kernels using subgroups may not work, you may also run into unexpected OpenCL errors like CL_EXEC_STATUS_ERROR_FOR_EVENTS_IN_WAIT_LIST and other issues.

Note that chipStar relies on the proprietary OpenCL implementation provided by Imagination Technologies. We have successfully managed to compile and run chipStar with a VisionFive2 device, using VisionFive2's pre-built Debian image 202403, driver version 1.19. Other SBCs may require additional workarounds.

Running Unit Tests

There's a script check.py which can be used to run unit tests and which filters out known failing tests for different platforms. Its usage is as follows.

BUILD_DIR={path to build directory. Make sure that build_tests target has been built}

BACKEND={opencl/level0}
^ Which backend/driver/platform you wish to test:
"opencl" = Intel OpenCL runtime, "level0" = Intel LevelZero runtime 

DEVICE={cpu,igpu,dgpu,pocl}         # What kind of device to test.
^ This selects the expected test pass lists.
  'igpu' is a Intel Iris Xe iGPU, 'dgpu' a typical recent Intel dGPU such as Data Center GPU Max series or an Arc.

export CHIP_PLATFORM=N         # If there are multiple OpenCL platforms present on the system, selects which one to use.

You can always verify which device is being used by chipStar by:
CHIP_LOGLEVEL=info ./build/hipInfo
python3 $SOURCE_DIR/scripts/check.py $BUILD_DIR $DEVICE $BACKEND

Please refer to the user documentation for instructions on how to use the installed chipStar to build CUDA/HIP programs.

Environment Variables

CHIP_BE=<opencl/level0>                         # Selects the backend to use. If both Level Zero and OpenCL are available, Level Zero is used by default
CHIP_PLATFORM=<N>                               # If there are multiple platforms present on the system, selects which one to use. Defaults to 0
CHIP_DEVICE=<N>                                 # If there are multiple devices present on the system, selects which one to use. Defaults to 0
CHIP_DEVICE_TYPE=<gpu/cpu/accel/fpga> or empty  # Selects which type of device to use. Defaults to empty.
CHIP_LOGLEVEL=<trace/debug/info/warn/err/crit>  # Sets the log level. If compiled in RELEASE, only err/crit are available
CHIP_DUMP_SPIRV=<ON/OFF(default)>               # Dumps the generated SPIR-V code to a file
CHIP_JIT_FLAGS=<flags>                          # Additional JIT flags
CHIP_L0_COLLECT_EVENTS_TIMEOUT=<N(30s default)> # Timeout in seconds for collecting Level Zero events
CHIP_L0_EVENT_TIMEOUT=<N(0 default)             # Timeout in seconds for how long Level Zero should wait on an event before timing out
CHIP_SKIP_UNINIT=<ON/OFF(default)>              # If enabled, skips the uninitialization of chipStar's backend objects at program termination
CHIP_MODULE_CACHE_DIR=/path/to/desired/dir      # Module/Program cache dir. Defaults to $HOME/.cache/chipStar, if caching is undesired, set to empty string i.e. export CHIP_MODULE_CACHE_DIR=

Example:

╭─pvelesko@cupcake ~
╰─$ clinfo -l
Platform #0: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) Arc(TM) A380 Graphics
Platform #1: Intel(R) OpenCL Graphics
 `-- Device #0: Intel(R) UHD Graphics 770

Based on these values, if we want to run on OpenCL iGPU:

export CHIP_BE=opencl
export CHIP_PLATFORM=1
export CHIP_DEVICE=0

NOTE: Level Zero doesn't have a clinfo equivalent. Normally if you have more than one Level Zero device, there will only be a single platform so set CHIP_PLATFORM=0 and then CHIP_DEVICE to the device you want to use. *You can check the name of the device by running a sample which prints the name such as build/samples/0_MatrixMultiply/MatrixMultiply

Troubleshooting

Clang++ Cannot Find libstdc++ When Building chipStar

This occurs often when the latest installed GCC version doesn't include libstdc++, and Clang++ by default chooses the latest found one regardless, and ends up failing to link C++ programs. The problem is discussed here.

The issue can be resolved by defining a Clang++ configuration file which forces the GCC to what we want. Example:

echo --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/11 > ~/local/llvm-17/bin/x86_64-unknown-linux-gnu-clang++.cfg

Missing Double Precision Support

When running the tests on OpenCL devices which do not support double precision floats, there will be multiple tests that will error out.

It might be possible to enable software emulation of double precision floats for Intel iGPUs by setting two environment variables to make kernels using doubles work but with the major overhead of software emulation:

export IGC_EnableDPEmulation=1
export OverrideDefaultFP64Settings=1

If you device does not support emulation, you can skip these tests providing -DSKIP_TESTS_WITH_DOUBLES=ON option at cmake configure time.