I had been meaning to attend the LLVM Developers’ Meeting for a couple of years now, mostly because it happens right next door in San Jose. This year the conference went virtual and actually made it easy for me to finally attend all 3 days (Oct 6-8). Below are my notes from the talks I attended from their multiple-track agenda. This being the first time I am attending a compiler conference, let alone a LLVM one, I focused on gaining basic knowledge of the software architecture of compilers, usage of common tools and techniques in the field.
Overall review of the conference
This was my first virtual conference and I was highly skeptical if it would work out. But I was pleasantly surprised how well the conference was!
- There is a good mix of talks about introductory topics, useful tools, and usage tutorials. I had never been to a compiler conference and I learnt so much practical information about LLVM and GCC compilers and associated tools. I will be sure to attend this conference every year.
- The organizers used a service called Whova to organize and host the entire conference. Whova made it easy to look at the agenda, choose talks, attend talks, switch between talks and it has a good Android app too.
- The talks are pre-recorded, so they all finished on time, with a good pace and had 5-10 minutes of Q&A. You have no idea how fantastic the pre-recorded on-time talks mean to me. This is because my biggest gripe in conferences is that many speakers spend their entire time on 2 starting slides, and then ram through 30 slides in the last 5 minutes and then go and eat up the Q&A time too. That sucks. For this conference, I saw that for a few speakers, who might have taken longer time, the organizers had edited their talks down to the designated time.
- The Whova interface allowed attendees to ask questions and vote on them. After each pre-recorded talk was played out, there was a live Q&A session with the speaker and a chair. The chair would ask the speaker the questions we had typed in during the talk. There was a chat section in each talk for the attendees to chat amongst us too.
- After the talk is over, it is uploaded to and available on Youtube. Sadly, only the pre-recorded part of the talk is uploaded, not the live Q&A at the end.
- There were coffee breaks in a virtual room where you could gather around virtual tables and talk with the other attendee avatars. I did not try this since the schedule was too packed that I need the break time for myself.
Day 1
- Talk by Nick Desaulniers, who works on compiling the Linux kernel with LLVM at Google. (Slides)
- Begins with a brief introduction to stages of C++ code compilation with clang and the LLVM source code paths of those modules.
- The clang driver converts C++ code to LLVM IR. Stages in it are preprocessor, lexer, parser (generates AST), SEMA (Semantic Analysis and Template Instantiation) and LLVM IR emission (CodeGen).
- Abstractions of input code in LLVM are llvm::module (translation unit), llvm::function, llvm::basicBlock and llvm::Instruction.
- IR has 3 representations: in-memory during runtime, textual for read/write by humans and binary serialized for files.
- CodeGen: General code generation happens in llvm/lib/CodeGen and machine specific code generation happens in llvm/lib/target/XYZ
- There are four types of register allocators.
- ASM streamer writes the different types of object files.
- Building LLVM: cmake to generate build files, ninja to invoke build.
- LLD is faster than BFD. BFD is the default linker on Linux.
- llvm-readelf tool can be used to check if a library file was built using LLVM.
- opt tool can be used to run different optimizations on IR files.
- creduce and cvise can be used to do test case reduction.
- LLVM weekly and blog are good resources.
- “The master has failed more times than the disciple has even tried.”
- Talk by Hafiz Abid Qadeer.
- A typical compiled binary or library runs atop of the OS and with the support of dynamically linked libraries. This talk was about how to compile a binary that can run baremetal on hardware with no OS in between using the LLVM toolchain.
- I dropped out halfway from this talk because this was a topic which was not relevant to me.
- Talk by Alan Phipps.
- LLVM’s coverage tool has support for function, line and region coverage. But it does not have branch coverage, which the speaker added.
- Counters are used at function, statement and condition and incremented when they are executed.
- Using simple if conditions, the speaker explained how he added counters to add branch coverage.
- MC/DC is a future work once branch coverage is checked into LLVM.
- GCC gcov (console output) and lcov (HTML output) are the competitors which already have branch coverage, with some caveats.
- Talk by Mandeep Singh Grang and Katherine Kjeer.
- Checked C adds 3 new pointer types to C to handle buffer overflow and null pointer dereference.
- New pointer types are
_Ptr<T>
, _Array_ptr<T>
and _Nt_array_ptr<T>
.
- There exists a Checked C tool that can convert C code to Checked C code.
Day 2
- Talk by Vince Bridgers.
- Most bugs are introduced early in the development process, but discovered and fixed late. Thus the cost of having a bug in the code is pretty high. Weeding out bugs as early as possible reduces software cost.
- Types of program analysis tools:
- Compiler diagnostics: Warnings from GCC and Clang. Part of compilation.
- Linters and style checkers: Use text/AST matching. An extra step after compilation.
- Static analysis: cppcheck, GCC 10+, clang. These use symbolic execution. An extra step after compilation.
- Dynamic analysis: valgrind, gcc, clang. Injection of runtime checks or library. Long runtimes due to running extra code. An extra step after compilation.
- LLVM compiler flow is: Frontend -> Optimizer -> CodeGen
- clang-tidy: Uses AST matchers to find and replace patterns. It has full access to AST and preprocessor. There are 200+ existing checks. It is extensible to add custom checks.
clang-tidy --list-checks
: To list all currently active checks. This is not the full list of available checks.
clang-tidy --list-checks -checks=*
: Lists all available checks.
clang-tidy ... -checks=-*,<your specific check>
: To pick out a specific check to apply.
clang-tidy ... --fix
: Not just check, but also fix the errors found.
- Full list of available checks. Checks with fixes are indicated in the second column.
clang-query
is an interactive tool to play around with clang C++ API to query AST and figure out the calls to match a pattern.
- Write the new custom check (and fix) in C++ code and add to clang-tidy using
add_new_check.py <category> <check name>
- There are clang-tidy tools to apply checks across an entire codebase in parallel without the fixes overwriting on each other.
- Talk by Siva Chandra Reddy.
- Motivations to create a new cleanroom implementation of libc: sanitizer friendly, use standard C/C++ and no assembly and to make it modular so that users can mix libc pieces from different libc implementations (like GNU libc and MUSL).
- LLVM libc uses C++ templates internally to reduce code for different variants of math functions.
- threads.h (mutex and thread functions) and signal.h will be platform specific code.
- mmap, munmap: Used for creation and destruction of thread stacks for threads.
- Loader: It starts the user application calling main at the end and also cleans up after the application has ended.
- Static-PIE ELF loader is in the works.
- It will be a few years before this has all the libc functions and can fully replace GNU libc on just Linux. Other platforms are more years away.
- By Aditya Kumar.
- Explicit template instantiations reduce code size and compilation time.
- Move function definitions from headers to source files. Especially with C++, which adds ctors, dtors, operator overloading and explicit template instantiations.
- Use
__attribute__((noinline))
and __attribute__((cold))
.
- list and array are cheaper than vector and deque. map and set are cheaper than unordered_map and unordered_set.
- Avoid sorting if possible. Bubble sort is cheaper than quick sort. And linear search is cheaper than binary search.
- Move rarely used functions to shared library to reduce program launch time.
- elfcompress can compress sections.
- llvm-strip can strip symbols.
- Talk by Baptiste Saleil and João Carvalho.
- MMA instructions have been added to PowerISA 3.1. So they will be available in POWER10 and later CPUs.
- The motivation is to speedup kernels of matrix multiplication and convolutions used in deep learning.
- 8 new accumulators which are 512-bit registers. Each accumulator is associated with 4 existing 128-bit VSR registers.
- 3 types of instructions have been added:
- Move data to and from MMA accumulators: To move data to accumulator or set to zero use primed instructions (VSR->acc). To move data from accumulator use unprimed instruction (acc->VSR).
- Integer MMA instructions: Multiply two 4x2 inputs to produce 4x4 output. Inputs are typically lower precision (Ex: INT16) than outputs (Ex: INT32). Available for all INT types like INT16, INT32, INT8, INT4.
- Floating point MMA instructions: Multiply two 4x2 inputs to produce 4x4 output. Same as INT above. Available for all supported float types: FP64, FP32, FP16 and Bfloat16.
- New API has been added based on LLVM compiler intrinsics.
- New optimization pass has been added to LLVM to identify GEMM variants in IR to map to MMA instructions.
- LLVM recently added
llvm.matrix.*
instructions that can be used for this mapping. llvm.matrix.column.major.load()
to load matrix for MMA and llvm.matrix.multiply()
to do MMA.
- Talk by Florian Hahn.
- Support for multiplication of small matrices is now added. This is for matrices of sizes ranging from 2x2 to 16x16. These matrices are small enough to fit in the vector registers of common CPUs.
- The matrices are column major by default.
- Supporting matrix types in LLVM/Clang guarantees vector code generation for this code and also removes unnecessary memory access.
- For C/C++ code Clang supports:
- New matrix type with number of columns and rows specified at compile time.
- Operation
*
defined for multiplication of matrices and +
and -
defined for elementwise addition and subtraction of matrices.
[][]
defined as element subscript operator for matrices.
- New builtins for transpose, store to memory (with user strides) and load from memory (with user strides).
- More details at Matrix types.
- In LLVM IR:
- New instructions of the form
@llvm.matrix.*
.
- The matrix is embedded in flat vectors.
I noted the following from the talks which were 5-minutes each:
- NVIDIA has contributed Flang, Fortran compiler frontend to LLVM.
- There was a person who had written a CUDA backend for SyCL
- LLDB debugger supports scripting using Python (
import lldb
) and other languages like Lua using SWIG.
Day 3
- By Mehdi Amini and River Riddle.
- MLIR is a framework to build a compiler IR. We can define type systems, operations in it.
- Uses could be for various code generation strategies and to support accelerators.
- MLIR is being used to process graphs in TensorFlow/XLA/ONNX and for Flang IR.
- The rest of the tutorial shows how to use MLIR for a Toy language. The content of this tutorial is online.
- Operations are the most basic component of MLIR. Everything is an operation in MLIR. MLIR operation is different from LLVM instruction.
- All operations share the same structure. Operations can have regions, blocks and other operations.
- Dialect defines the rules and semantics of an IR. It has namespace, operations, types and passes.
- By Prashantha NR, Vinay Madhusudan and Ranjith Kumar.
- Motivation for CIL:
- C/C++ does not have a middle level IR. LLVM IR is too low level for some optimizations. C/C++ STL constructs need a high level IR for high level optimizations.
- Fortran defined its own IR, it could have reused C++ IR instead.
- Uses:
- LLVM does not remove empty foreach loops in C++.
- To handle multi-dimensional arrays.
- To handle heterogenous computing.
- CIL is implemented as a MLIR dialect to handle C/C++ and Fortran. After CIL-level of optimizations, it can then be lowered to LLVM IR for low level optimizations.
- CIL-LTO: Combines multiple MLIR modules into a single MLIR module. Useful to do LTO like function inlining, etc.
- By Alex Denisov
- Mutation testing shows the semantic gaps between test suite and the software. It finds out incorrect tests, potential vulnerabilities and dead code.
- The idea is to mutate code, compile, link, run tests and repeat. So mutation testing is very costly.
- Example of mutation: Replace
*
with +
and check if unit tests fail or not.
- Mull is a mutation testing tool for C/C++. Mull makes it performant to do mutation testing. It relies on bitcode generated by LLVM and works by forking processes and switching function pointers to different mutants and testing them. This avoids a whole compilation loop for each mutant.
- By Simeon Ehrig.
- Cling is a JIT compiler for C++ code. Can be used at the terminal like a REPL interpreter or in Jupyter notebook.
- For each new statement, Cling creates a new transaction object. Execution statement-by-statement creates a DAG of transaction nodes. This is how Cling allows undoing a previous statement and replacing with a modified statement instead.
- Speaker added CUDA kernel compilation and calling support to Cling. Used Google’s GPUCC compiler for some of the work.
- By Milena Vujosevic Janicic.
- AUTOSAR has 402 rules. 150 directly from MISRA C++. 200 from other C++ standards. 60 based on papers etc.
- Speaker focused on required/advisory rules that are implementation based and can be automated.
- Some of the rules can be enforced using Clang warnings.
- For rules that need semantic analysis, AST matchers and AST visitors of Clang can be used.
- AST matchers provide a simple way to describe patterns in AST. Can be specified in clang-tidy checks.
- AST visitors provide the full power of Clang AST.
- For some rules, static analyzer was used. It is a source code analysis tool that uses both AST and CFG. But it is slower than using AST matchers or visitors.
- 190 AUTOSAR rules have been implemented in the above way into a tool called autocheck.
- By Vince Bridgers.
- Static analysis: Execute code symbolically through simulation. Errors it finds are memory leaks, buffer overruns and logic errors.
- Dynamic analysis: Instrument code and run the code on real target. Examples: UBSAN, TSAN, ASAN.
- Program analysis can exhaustively explore all execution paths in program’s state space.
clang --analyze foobar.cpp
. Can also dump to a HTML report with other options.
- Incorporated into Ericsson’s CodeChecker tool.