/LLVM-Code-Generation

LLVM Code Generation, published by Packt

Primary LanguageC++MIT LicenseMIT

LLVM Code Generation, First Edition

A deep dive into compiler backend development

Quentin Colombet

This is the code repository for LLVM Code Generation, First Edition, published by Packt.

      Free PDF       Graphic Bundle       Amazon      

About the book

LLVM Code Generation, First Edition

The LLVM infrastructure is a popular compiler ecosystem widely used in the tech industry and academia. This technology is crucial for both experienced and aspiring compiler developers looking to make an impact in the field. Written by Quentin Colombet, a veteran LLVM contributor and architect of the GlobalISel framework, this book provides a primer on the main aspects of LLVM, with an emphasis on its backend infrastructure; that is, everything needed to transform the intermediate representation (IR) produced by frontends like Clang into assembly code and object files. You’ll learn how to write an optimizing code generator for a toy backend in LLVM. The chapters will guide you step by step through building this backend while exploring key concepts, such as the ABI, cost model, and register allocation. You’ll also find out how to express these concepts using LLVM's existing infrastructure and how established backends address these challenges. Furthermore, the book features code snippets that demonstrate the actual APIs. By the end of this book, you’ll have gained a deeper understanding of LLVM. The concepts presented are expected to remain stable across different LLVM versions, making this book a reliable quick reference guide for understanding LLVM.

Key Learnings

  • Understand essential compiler concepts, such as SSA, dominance, and ABI
  • Build and extend LLVM backends for creating custom compiler features
  • Optimize code by manipulating LLVM's Intermediate Representation
  • Contribute effectively to LLVM open-source projects and development
  • Develop debugging skills for LLVM optimizations and passes
  • Grasp how encoding and (dis)assembling work in the context of compilers
  • Utilize LLVM's TableGen DSL for creating custom compiler models

Chapters

LLVM Code Generation, First Edition
  1. Building LLVM and Understanding the Directory Structure
  2. Contributing to LLVM
  3. Compiler Basics and How They Map to LLVM APIs
  4. Writing Your First Optimization
  5. Dealing with Pass Managers
  6. TableGen – LLVM Swiss Army Knife for Modeling
  7. Understanding LLVM IR
  8. Survey of the Existing Passes
  9. Introducing Target-Specific Constructs
  10. Hands-On Debugging LLVM IR Passes
  11. Getting Started with the Backend
  12. Getting Started with the Machine Code Layer
  13. The Machine Pass Pipeline
  14. Getting Started with Instruction Selection
  15. Instruction Selection: The IR Building Phase
  16. Instruction Selection: The Legalization Phase
  17. Instruction Selection: The Selection Phase and Beyond
  18. Instruction Scheduling
  19. Register Allocation
  20. Lowering of the Stack Layout
  21. Getting Started with the Assembler

Requirements for this book

To follow the instructions in this book, you need LLVM 20 installed on your system, running on Windows, macOS, or Linux operating systems.

Navigate in the different chX directory and look at the examples provided and do the exercises when applicable. Each directory has its own README.md with specific directions.

Note: The exercises have been tested with the open source repository of LLVM at the Git hash 424c2d9b7e4d from February 13th 2025. Which is LLVM 20.1.1.

Some of the exercises interact directly with the LLVM C++ API. This API has no stability guarantee therefore it is possible that newer or older version of LLVM will not work with these exercises.

For the exercices that requires a version of LLVM handy, if you build your own make sure to use the CMAKE_INSTALL_PREFIX cmake variable to set the install path, then build the install target.

Then, you will need to provide this path to CMake in the different exercise.

Follow the READMEs in the different directories when you get there.

Get to know the author

Quentin Colombet is a veteran LLVM contributor specializing in compiler backends. He is the architect of the new instruction selection framework (GlobalISel) and code owner of the LLVM register allocators. With over two decades of experience, he has worked on compiler backends for a variety of architectures, including GPU, CPU, microcontrollers, DSP, and ASICs. Quentin joined Apple in 2012 and has contributed to x86, Aarch64, and Apple GPU backends. He is passionate about helping newcomers onboard the LLVM infrastructure, having mentored interns and new hires over the years.

Other Related Books

Errata

  • Page 6: Under the heading Identifying the right version of the tools, in step 1 the hyperlink on the URL [https://releases.llvm.org/] in the digital formats redirects to [https://www.python.org/downloads/]. Please copy and paste the link [https://releases.llvm.org/] in the browser to navigate to the correct webpage.
  • Page 11: In the command $ git clone https://github.com/llvm/llvm/project.git, the URL should be https://github.com/llvm/llvm-project.git. Therefore, the first line becomes $ git clone https://github.com/llvm/llvm-project.git.
  • Page 71: In the descriptipn below Figure 3.4, the sentence "Because of that, inserting a store in A and reloading in B means that the whole dotted region needs to play nicely with this memory location." should be "Because of that, inserting a store in A and reloading in C means that the whole dotted region needs to play nicely with this memory location."
  • Page 106: In Figure 4.6, the block at the center labelled as "excluding" should be "exiting".
  • Page 361: The term MCInstrPrinter should be MCInstPrinter.
  • Page 363: Both the instances of the term XXXInstrPrinter should be XXXInstPrinter.
  • Page 455: In Table 17.1, under the Original code column on the left side, the line %vec1 = insertelement <2 x i32> %vec, i32 %a, i32 1 should be %vec1 = insertelement <2 x i32> %vec, i32 %b, i32 1 (i.e. %a should be %b).