/telamon

A framework to find good combinations of optimizations for computational kernels on GPUs.

Primary LanguageRustApache License 2.0Apache-2.0

Telamon

Build Status

Telamon is a framework to find good combinations of optimization for computational kernels on GPUs. It currently focuses on dense linear algebra. For more information on how it works internally, we encourage you to read our paper.

Getting Started

To compile Telamon, you need the nightly version of rust 1.31 or higher installed. If you want to generate code for GPU, you will also need a CUDA toolchain installed, with cuda, curand and cupti accessible in the include and library paths. You can also view the documentation on github. Telamon is now compiled with edition 2018 of rust.

Examples of kernels are located in the kernels/ directory. In particular, kernels/src/linalg.rs contains linear algebra kernels. You can compare the code generated by Telamon to the state of the art implementation on GPUs by running

cargo +nightly bench --features=cuda --bench=cuda-search

in the kernel/ directory. To see the progress of the exploration, append RUST_LOG=telamon::explorer=warn to the command.

Writing a Kernel

To write a kernel, you must first define the inputs of the kernel and the context we optimize for. Here, we assume we optimize for a Cuda GPU, but the process is the same for other backends.

use telamon::device::cuda;
use telamon::helper;

let _ = env_logger::init(); // Enable logging
let executor = cuda::Executor::init(); // Setup the interface with the device.
// Build the signature and bind the inputs in the context.
let mut context = cuda::Context::new(&executor);
let array_a;
let signature = {
    let mut sig_builder = helper::SignatureBuilder::new("my_kernel", &mut context);
    // Create a signature with two arguments: a scalar `m` and an array of floats. We
    // give the value we want to optimize for to each argument.
    sig_builder.scalar("n", 1000i32);
    array = builder.array::<f32>("a", 1000); // Creates an array of size 1000.

    sig_builder.get()
};

We can now describe the body of the kernel itself. Here we create a kernel that computes x[i] = 2*i for each i in 0..n For that we use a builder that creates the loops and the instructions for us. The builder keeps the list of open loops and nest new instructions in them.


Telamon now has a nearly functional mppa backend. While most kernels run
perfectly fine, it is still buggy and a lot of hacks take place as the runtime
we rely on is not really satisfying. Also for diverse reasons, telamon
must be run and compiled with a prefix scl enable llvm-toolset-7
"MPPACL_LOCAL_SIZE=128K cargo ...". scl enable llvm-toolset-7 tells cargo not to
use custom Kalray C library (although we need them for compiling kernels for
mppa). MPPACL_LOCAL_SIZE=128K is mandatory if we want to use multithreaded
kernels. For example, in kernels:
 scl enable llvm-toolset-7 "MPPACL_LOCAL_SIZE=128K cargo +nightly run  --features=mppa
 --bin exec_dump gesummv gesummv.dump"
 is used to run a dump (of a given kernel) on mppa. Feel free to put that in an
 alias.

```rust
let mut builder = helper::Builder::new(&signature, context.device());

// Open a loop of size n.
let size = builder.param_size("n");
let dim0 = builder.open_dim(size);
// Compute `x = 2*i` where `i` is the index on loop `dim0`.
let x = builder.mul(&dim0, &2i32);
// Store `x` in `a[i]. For that, we first compute the address of `a[i]` and build a
// that describes the access patern for the performance model.
let (addr, access_pattern) = builder.tensor_access(&"a", a, &ir::Type::I(32), &[&dim0]);
builder.st(&addr, x, access_pattern);

// Close the loop.
builder.close_dim(&dim0);

let search_space = builder.get();

We are now ready to start the search space exploration to find the best candidate.

use telamon::explorer;

let best = explorer::find_best(explorer::config::read(), &context, search_space, None).unwrap();
context.device().gen_code(&best, &mut std::io::stdout());

License

Telamon is released under the Apache Licence (version 2.0). See LICENSE for more details.