/ticompile

Explore JIT in TiDB

Primary LanguageLLVMApache License 2.0Apache-2.0

TiCompile Project

Author: Zixiong Liu

Introduction

The TiCompile project aims to bring JIT (just-in-time compilation) to TiDB's computation layer. This project will add JIT support in TiDB so that it can compile the execution plan of computation-intensive queries into machine code, which will be cached for repeated use.

In the first stage of the project, we plan to support:

  1. JIT of expressions
  2. JIT of aggregations
  3. JIT of hash-join (challenge)

The goal of this project is to explore the possibility of accelerating the computations done by TiDB, especially for situations with complex expressions. Hopefully, this project can produce some valuable results so that we can make TiDB a better candidate in some computation-intensive scenarios.

Design Details

The compilation of an execution plan is done as follows:

  1. Convert the execution plan to LLVM IR (intermediate representation), which is a low-level language in SSA (static single assignment form). This is done by pure Go code by using the library llir/llvm.

  2. The resulted LLVM IR is then stored in Bitcode (binary) format and then passed to a LLVMBridge library written in C++, which in turn utilizes the LLVM library to emit machine code and store the machine code in an executable region of memory. The LLVMBridge returns a function pointer to Go code. The invocation of C++ code is done through CGO. The performance of CGO is of little concern because the cost of converting the LLVM IR to machine code would dominate.

  3. The Go code will cache the function pointer, and invoke the function pointer when it is needed. This invocation here is done without CGO by directly executing a call instruction in Go. Since the compiled function is purely computational (i.e. without IO and other interaction with the OS), calling it directly within a goroutine context would be uncomplicated.

Why LLVM?

LLVM is a mature compiler infrastructure library friendly with JIT. Using LLVM brings us two advantages.

  1. We can use LLVM IR, which has a human-readable format so that debugging code generation would be easier, and it is also more structural than pure assembly, making it easier to be generated by Go functions. Also, LLVM IR is platform-neutral, making it easier to implement support for both x86-64 and Arm, and potentially other architectures.

  2. Using LLVM makes it possible to apply ready-to-use optimization passes to the generated code, without having to implement these optimizations by ourselves.

In summary, by utilizing a mature tool, we free ourselves from having to implement a cross-platform compiler from ground up, so that we can focus on how to make the JIT mechanism be more compatible with our database system (TiDB).

Why not WASM?

WASM may seem to be a good alternative, and it is a popular language to be embedded into a large project. We are not choosing to compile to WASM because the semantics of WASM are extremely similar to the machine code, so the cost (both to develop and to run) to compile an execution plan to WASM is similar to that of compiling to machine code. Since machine code more low-level, it would get better performance (however small an advantage it may be), we may as well directly compile to machine code.

References