Alternate instruction sequences in LLVM

I’ve been wondering how to express the fact that two instructions (or instruction sequences) are equivalent in LLVM IR.

The reason why this matters is that transforms operating at the IR level are supposed to be architecture-agnostic but this poses some problems: for example, consider a transform that takes a loop and rewrites it for better performance. There are multiple ways of expressing the same loop (think loop unrolling, loop switching, tiling, pipelining, etc.) but without knowledge of the target architecture we can only use some heuristics to choose which transforms to apply (e.g. should we unroll the loop 4 or 8 times? should we switch the inner and outer loop?). To make an optimal decision we would need to know the instruction timings and quite a few details about the cache/memory subsystem, but these are (at best) known only to the code generator in the backend.

To solve this problem a good idea could be having IR transforms generate multiple versions of the same code with different optimization choices applied and letting the backend choose the one that, on the target architecture, should perform better. These “multiple versions of the same code” are the “alternate instruction sequences” mentioned in the title of this post.

Unfortunately, LLVM does not provide an explicit mechanism to signal that a certain IR construct (instruction, basic block, function, etc.) is an alternate version of another construct, so we have to somewhat “bend” the meaning of certain existing instructions to express this concept.

Alternate regions

A region is defined as a group of basic blocks having exactly one incoming edge and at most one outgoing edge. Suppose that one transformation is able to generate N different versions of such region and that we want to insert them all in the IR to let the backend choose the best one. We can write:

switch undef, %alt1, [i32 2, %alt2], ..., [i32 N, %altN], !alt

The meaning of this switch instruction is exactly what we need: basically we’re saying “the program behavior is well-defined regardless of which of the following N basic blocks we jump to”, thereby allowing the backend to choose any one of them (hopefully the best-performing one).

Obviously we also need to insert the necessary phi instructions after the region if there are live-out values that were defined inside the region.

It’s most important to understand that all alternate versions should be semantically equivalent, including in their side-effects: violating this rule will likely lead to undefined program behavior.

There is much more to say about this topic (including an alternate/simplified syntax for linear and side-effect-free instruction sequences) but that will have to wait: I’m currently working on a formal proposal to be sent to the llvmdev mailing list. I’ll post a followup as soon as that has been submitted (and hopefully accepted).

Call-graph flattening for LLVM

I just published the source code of the pilot implementation of the call-graph flattening transform (CGF) for LLVM. This is the same implementation used in my thesis.

What CGF does is basically getting rid, during compilation, of the notion of “functions”. It does this by merging all the code together and by statically managing the stack during compilation. This means, ideally, getting the same benefits of inlining (no call overhead, callsite specialization) but without code duplication. Besides, many other optimizations are possible thanks to CGF, such as inter-procedural code motion. Moreover, since CGF is practically equivalent to continuation-passing style (CPS), there are interesting possibilities for efficient non-linear control flow semantics (exception handling, coroutines, etc.)

Currently the code is not ready for general usage because there are some cases (e.g. indirect recursion) that are not detected. Also, there are some parts of LLVM, such as the register spiller and the register allocator, that choke on CGF-transformed code because of the (potentially) huge number of PHI nodes and branches. For examples and a detailed account of open problems see chapter 4 of the thesis.

I’m planning to keep working on the implementation (and submitting LLVM bug reports) until the transformation is ready to be used on real-world code. If you have any suggestions or questions about this, don’t hesitate to get in touch with me.