This task involves implementing two LLVM passes and a runtime library that your LLVM pass will invoke.
When developing LLVM passes, you can find a lot of valuable methods already implemented in LLVM.
You are allowed (and encouraged!) to use them instead of re-implementing logic yourself.
Before starting your implementation, it might be helpful to look at the documentation of classes
like llvm::Instruction
, llvm::BasicBlock
, llvm::Function
, and others. Also, look at
llvm/Transforms/Utils/Local.h
.
You are not allowed to call/use other LLVM passes (e.g. instcombine
) or depend on external analyses (e.g.
TargetIRAnalysis
). This also means that using the function simplifyCFG
is not allowed.
Your first assignment is to design a pass that eliminates redundant instructions. We define dead instructions as instructions that write to a register not used by subsequent instructions and have no observable side effects (e.g., write to memory, return, perform a jump).
There are three test cases for this task that increase in difficulty.
Initially, your focus should be on eliminating redundant instructions that neither yield any use nor produce observable side effects.
In the following example, the instruction %mul
is dead and can be removed.
Consequently, the instruction %add
becomes dead and can be removed too.
define dso_local i32 @main(i32 %argc, ptr %argv) {
entry:
%add = add nsw i32 %argc, 42
%mul = mul nsw i32 %add, 2
ret i32 0
}
Your pass should transform the above code into the following code:
define dso_local i32 @main(i32 %argc, ptr %argv) {
entry:
ret i32 0
}
In the subsequent test case, we expect your pass to remove and streamline irrelevant basic blocks, which surface as a result of eliminating redundant instructions.
Conditional branches that always branch to the same block:
entry:
; ...
br i1 %cmp, label %block.1, label %block.1
Can be simplified to:
entry:
; ...
br label %block.1
Blocks that just contain a single unconditional branch instruction to the next block:
entry:
; ...
br i1 %cmp, label %block.1, label %block.2
block.1:
br label %block.2
block.2:
; ...
Can be simplified to:
entry:
; ...
br i1 %cmp, label %block.2, label %block.2
; block.1 has been removed
block.2:
; ...
After doing this optimization, you can now transform the conditional branch instruction to an unconditional branch.
The third test case is to remove useless loads/stores from/to stack slots. You can walk over the function and check if a stores are never read again, i.e. there is no load from this stack slot in the same function and the stack slot is not passed to another function. For simplification, you do not need to check if the load/store is volatile.
define dso_local i32 @main(i32 %argc, ptr %argv) {
entry:
%x = alloca i32
store i32 10, i32* %x
ret i32 0
Since the stored value is never read again, we can remove the store instruction. Additionally, we can then remove the alloca instruction, since it is not used anymore.
define dso_local i32 @main(i32 %argc, ptr %argv) {
entry:
ret i32 0
}
For the second task, you construct a basic version of AddressSanitizer. You should implement a pass that runs over the program, and inserts calls to a runtime library that checks if a memory access is valid.
The idea is that the runtime library keeps track of allocated memory and checks if a memory access is within the bounds of the allocated memory. If it is not, the runtime library should print an error message and abort the program. You can check out the AddressSanitizer paper.
We have provided you with a basic skeleton for the runtime library, but you can adapt it to your preference and needs. Allocations should be padded on the left and right with a few bytes to detect out-of-bounds accesses close to the allocated memory. You only need to detect overflows within 16 bytes of the allocated memory (as ASan does).
If an invalid memory access is detected, you should print an error message to stderr
that contains the following
substring (we check for this in the tests):
Illegal memory access
We again have three different test cases:
The first test case checks that out-of-bounds access to the heap is detected. We will only check out-of-bounds access within a few bytes of the allocated memory.
int main() {
int *x = malloc(16 * sizeof(int));
x[15] = 0; // in bounds, should still work
x[16] = 0; // out of bounds, should be detected
x[-1] = 0; // out of bounds, should also be detected
return 0;
}
This test case checks that your pass can detect a use-after-free error.
int main() {
int *x = malloc(16 * sizeof(int));
free(x);
x[0] = 0; // use after free, should be detected
return 0;
}
This test case ensures your pass can detect out-of-bounds access to a stack slot.
int main() {
int x[16];
x[15] = 0; // in bounds, should still work
x[16] = 0; // out of bounds, should be detected
x[-1] = 0; // out of bounds, should also be detected
return 0;
}
You can first start with a naive implementation that doesn't concern itself with performance and memory usage. Once you have a working implementation, you can improve its performance and memory usage. This final test checks that your application can handle a large number and size of allocations with a memory limit.
One approach you could try is implementing a compact shadow memory. See the AddressSanitizer paper for inspiration.
tasks/dead-code-elimination/build/libDeadCodeElimination.so
tasks/memory-safety/build/libMemorySafety.so
tasks/memory-safety/build/libMemorySafetyRuntime.a
The LLVM passes can only be implemented in C++, not C or Rust. You may implement the runtime for task 9.2 C, C++, or Rust. Update the Makefile accordingly.
You must have LLVM 16, ninja, make, and CMake installed to build this on your local system.
The assignments will run make dce
(dead-code-elimination) and make memory-safety
.
Make sure that this produces the correct files.
On Debian-based systems, you can use the automatic LLVM installation script:
pip3 install lit
wget https://apt.llvm.org/llvm.sh && chmod +x llvm.sh && ./llvm.sh 16 clang opt
# since the script installs the binaries with a version suffix, we create symlinks
for i in clang opt llc FileCheck; do ln -s $(which $i-16) /usr/local/bin/$i; done
On macOS, you can use Homebrew:
pip3 install lit
brew install llvm
# link to the correct library paths so cmake can find them
export PATH="/opt/homebrew/opt/llvm/bin:$PATH"
export LDFLAGS="-L/opt/homebrew/opt/llvm/lib"
export CPPFLAGS="-I/opt/homebrew/opt/llvm/include"
Open the project folder in Clion. It should automatically detect the CMakeLists.txt files and configure the project.
Install the extension ms-vscode.cpptools-extension-pack
.
If the LLVM headers can't be opened: Ctrl+Shift+P -> >C/C++: Edit Configurations (JSON)
-> Add LLVM include
directories to includePath
, e.g. when using the VSCode devcontainer:
"includePath": [
"${workspaceFolder}/**",
"/usr/include/llvm-16",
"/usr/include/llvm-c-16"
],
The tests run the subtests located in the tests/
folder.
Each test program is a python3 script and can be run individually, i.e.:
python3 ./tests/memory-safety/test-malloc.py
The dce tests are run with llvm-lit and can be run individually, i.e.:
pip install lit
lit -v ./tests/dce
For convenience, our Makefile also comes with a check
target, which will run all tests in serial:
$ make check
To run a pass manually, you can do so with the following command:
Note: on macOS, replace .so
with .dylib
.
opt -load-pass-plugin ./tasks/dead-code-elimination/build/libDeadCodeElimination.so -passes=dead-code-elimination -S input.ll