/spmdfy

a transpiler from CUDA to ISPC using libTooling

Primary LanguageC++MIT LicenseMIT

SPMDfy

A transpiler from CUDA to ISPC using clang libTooling. Some of the experiments that I had been doing can be found here. The tools was developed as a part of Google Summer of Code project under CERN with Daniel Hugo Cámpora Perez and Axel Naumann as mentors.

Presentations

The presentations will give you a detailed description of the project, it's motivation and future directions.

  1. 1st Real Time Analysis Workshop
  2. GSoC Wrapup
  3. CERN Lightning Talks

Requirements

  1. CUDA 9.0 - 9.2
  2. ISPC - use built in alloy.py script to install
  3. clang 9 - with libclang and llvm-tools
  4. CMake 3.5.0 or greater

Docker Image

There is a Dockerfile provided with the project to the ease installation process

Build Instructions

mkdir build && cd build
cmake -G Ninja -DLLVM_DIR=path_to_llvm_cmake_dir ..

Examples

// CUDA
__global__ void reduce(int *a, int *partial_sum, int N) {
    size_t tid = threadIdx.x;
    size_t gid = threadIdx.x + blockIdx.x * blockDim.x;
    for (size_t s = N / 2; s > 0; s >>= 1) {
        if (tid < s) {
            a[gid] += a[gid + s];
        }
        __syncthreads();
    }
    if (tid == 0)
        partial_sum[blockIdx.x] = a[blockIdx.x * blockDim.x];
}


// ISPC
#define ISPC_GRID_START                                                        \
    Dim3 blockIdx, threadIdx;                                                  \
    for (blockIdx.z = 0; blockIdx.z < gridDim.z; blockIdx.z++) {               \
        for (blockIdx.y = 0; blockIdx.y < gridDim.y; blockIdx.y++) {           \
            for (blockIdx.x = 0; blockIdx.x < gridDim.x; blockIdx.x++) {

#define ISPC_BLOCK_START                                                       \
    for (threadIdx.z = 0; threadIdx.z < blockDim.z; threadIdx.z++) {           \
        for (threadIdx.y = 0; threadIdx.y < blockDim.y; threadIdx.y++) {       \
            for (threadIdx.x = programIndex; threadIdx.x < blockDim.x;         \
                threadIdx.x += programCount) {

#define ISPC_KERNEL(function, ...)                                             \
    export void function(                                                      \
        const uniform Dim3 &gridDim, const uniform Dim3 &blockDim,             \
        const uniform size_t &shared_memory_size, __VA_ARGS__)
struct Dim3 {
    int x, y, z;
};

ISPC_KERNEL(reduce, uniform int a[], uniform int partial_sum[], uniform int N) {
    ISPC_GRID_START
    ISPC_BLOCK_START
    const unsigned int64 tid = threadIdx.x;
    const unsigned int64 gid = threadIdx.x + blockIdx.x * blockDim.x;
    ISPC_BLOCK_END
    for (size_t s = N / 2; s > 0; s >>= 1) {
        ISPC_BLOCK_START
        const unsigned int64 tid = threadIdx.x;
        const unsigned int64 gid = threadIdx.x + blockIdx.x * blockDim.x;
        if (tid < s) {
            a[gid] += a[gid + s];
        }
        ISPC_BLOCK_END
    }
    ISPC_BLOCK_START
    const unsigned int64 tid = threadIdx.x;
    const unsigned int64 gid = threadIdx.x + blockIdx.x * blockDim.x;
    if (tid == 0) {
        partial_sum[blockIdx.x] = a[blockIdx.x * blockDim.x];
    }
    ISPC_BLOCK_END
    ISPC_GRID_END
}

Documentation of the Tool

Sphnix and doxygen documentation of the tool can be generated by passing docs/all to the build system.

Usage

./spmdfy ../examples/transpose/transpose.cu -o transpose.ispc

Clang uses a compilation database to pass additional command line arguments. You can generate using cmake by passing CMAKE_EXPORT_COMPILE_COMMANDS which will dump compile_commands.json. If your codebase is compile nvcc, you can convert nvcc specific flags to clang's by running the tool here.

Feature List

  • Shared Memory - both dynamic and static
  • Atomic Functions
  • Syncthreads with complex control flow
  • Some CUDA Math libraries
  • Device Functions
  • Python Tool to convert compilation database

Future Work

  1. Dataflow analysis to detect partial nodes
  2. Inline of Device functions
  3. More C++ stuff - Convert C++ to C as ISPC is a C language.

Tests

List of tests that are currently working with the tool.

  • CUDA_Features/Shared_Memory
  • CUDA_Features/Atomic
  • Finite Difference
  • Transpose
  • Saxpy
  • Reduce

Links