How to Write an LLVM Backend #1: Getting Started
I normally set up my environment and experiment with what is already out there before writing code for a new project. So that is exactly what I am going to do here. In this post, I will show how to download and compile LLVM and other tools that will be useful for debugging. We will also see how to compile, assemble, link and run programs using existing LLVM backends and the GNU toolchain.
NOTE: Posts in this series:
- Introduction
- Getting Started
- Setting Up a New Backend
- Configuring the Build System
- Instruction Selection
- Arithmetic Instructions
Environment
I am using Ubuntu, but you should be able to replicate the steps in other systems with (relatively) few changes. You will need the following tools to build the software.
- Makefile
- C/C++ Compiler – I am using GCC 9.2.1
- autotools
- CMake
- Ninja
- Git
- A lot of patience…
NOTE: I might have forgotten something here, but the build system should kindly tell you via an error ;).
Compiling LLVM
The LLVM maintainers have set up this convenient repo that contains LLVM along with other parts of the toolchain such as Clang. So go ahead and clone that repo.
git clone https://github.com/llvm/llvm-project
We will be using LLVM 10.0.1 in this series of posts, so I recommend you build that version of the software. Bear in mind that LLVM changes very quickly, so some of the code shown here will not work in older/newer versions. However, the principles should roughly be the same.
LLVM uses CMake to generate the build files for the build system. There are a
few possible target build systems: Ninja, Makefiles, Visual Studio and XCode. I
normally use Ninja as I feel that its faster in my system (although I have no
evidence to back up that statement!). You can change the build system by
modifying the -G
argument to the cmake
command below.
The CMake files have a lot of options that I encourage you to explore as some can be very helpful for debugging. You can delight yourself reading about all the build options here. For now, I will use the following:
-DLLVM_ENABLE_PROJECTS
to build Clang alongside the rest of the compiler-DLLVM_TARGETS_TO_BUILD
to specify a list of backends to build. Looking at the output of other backends is enlightening and helpful for debugging, but the build will take ages if you add too many.-DCMAKE_BUILD_TYPE
to ask for aDebug
build.-DLLVM_ENABLE_ASSERTIONS=On
to enable assertions. Again, helpful for debugging.
Anyways, here is how you build LLVM after cloning the repo.
cd llvm-project
git checkout llvmorg-10.0.1
mkdir build
cd build
cmake -G "Ninja" -DLLVM_ENABLE_PROJECTS="clang" -DLLVM_TARGETS_TO_BUILD="ARM;Lanai;RISCV" -DCMAKE_BUILD_TYPE="Debug" -DLLVM_ENABLE_ASSERTIONS=On ../llvm
ninja
NOTE: You can find more information about building LLVM here and here.
NOTE: You can pass the -j <NUM_JOBS>
option for Ninja to indicate how
many jobs you want to run in parallel. A very high <NUM_JOBS>
causes the
build to crash in my system with a collect2: ld...
error message.
Compiling the GNU Toolchain for RISC V
You are probably a bit confused about why I am suggesting to build GCC for RISC V. Aren’t we writing our own compiler backend anyways?
We are building GCC because, at least initially, we want to use GCC’s assembler and linker to test the code generated by our LLVM backend. Recall that there are a lot of stages for the compilation process. At the early stages of our development we will have the following structure:
- Clang to compile C code to LLVM IR
- LLVM to optimize the IR
- Our LLVM backend to compile the IR down to assembly
- GCC to assembly and link the executable
Use the following commands to download, build and install GCC for RISC V.
git clone https://github.com/riscv/riscv-gnu-toolchain
cd riscv-gnu-toolchain
mkdir build
cd build
../configure --with-arch=rv32gc --with-abi=ilp32
make
make install
NOTE: Make sure you build the GCC toolchain for the right variant of the instructionset, i.e. RV32, as the build system’s default is RV64!
NOTE: The GNU toolchain supports multiple ABIs for RISC V, like ilp32, ilp32d and ilp32f, depending on whether you want soft floating-point, hard floating-point, etc.
Compile a simple C program
Everything its now set up to build and run our first program, although not with our own backend (yet!). Lets start with a simple C program:
#include <stdio.h>
int main(void)
{
printf("Hello world!\n");
return 0;
}
First, compile the C code to LLVM IR using Clang. Our program is using the
standard library function printf
from the stdio.h
header, so the compiler
will throw errors if it cannot find that header file. I will be using the
standard C library that is packaged with GCC for RISC V, so I had to use the
-isystem
argument. This adds an include path with the location of the much
needed header files to the list of search paths that Clang’s preprocessor is
using.
clang -O2 -emit-llvm -target riscv64 -isystem <PATH_TO_GCC>/riscv64-unknown-elf/include -c test.c -o test.bc
The previous command created a test.bc
file with the LLVM IR, but thats not
very human-readable. We can disassemble that file using the following command:
llvm-dis test.bc
Now lets compile the IR down to assembly using the backend that is packaged with out LLVM download using the command:
llc -march=riscv64 -O2 -filetype=asm test.bc -o test.S
Generating the program’s binary is fairly straight-forward with GCC. I split it into two steps, but you can use a single command if you prefer.
riscv64-unknown-elf-gcc -c test.S -o test.o
riscv64-unknown-elf-gcc test.o -o test
Finally, we can run the program using a simulator or real hardware.