I’ve been wondering how to express the fact that two instructions (or instruction sequences) are equivalent in LLVM IR.
The reason why this matters is that transforms operating at the IR level are supposed to be architecture-agnostic but this poses some problems: for example, consider a transform that takes a loop and rewrites it for better performance. There are multiple ways of expressing the same loop (think loop unrolling, loop switching, tiling, pipelining, etc.) but without knowledge of the target architecture we can only use some heuristics to choose which transforms to apply (e.g. should we unroll the loop 4 or 8 times? should we switch the inner and outer loop?). To make an optimal decision we would need to know the instruction timings and quite a few details about the cache/memory subsystem, but these are (at best) known only to the code generator in the backend.
To solve this problem a good idea could be having IR transforms generate multiple versions of the same code with different optimization choices applied and letting the backend choose the one that, on the target architecture, should perform better. These “multiple versions of the same code” are the “alternate instruction sequences” mentioned in the title of this post.
Unfortunately, LLVM does not provide an explicit mechanism to signal that a certain IR construct (instruction, basic block, function, etc.) is an alternate version of another construct, so we have to somewhat “bend” the meaning of certain existing instructions to express this concept.
A region is defined as a group of basic blocks having exactly one incoming edge and at most one outgoing edge. Suppose that one transformation is able to generate N different versions of such region and that we want to insert them all in the IR to let the backend choose the best one. We can write:
switch undef, %alt1, [i32 2, %alt2], ..., [i32 N, %altN], !alt
The meaning of this switch instruction is exactly what we need: basically we’re saying “the program behavior is well-defined regardless of which of the following N basic blocks we jump to”, thereby allowing the backend to choose any one of them (hopefully the best-performing one).
Obviously we also need to insert the necessary phi instructions after the region if there are live-out values that were defined inside the region.
It’s most important to understand that all alternate versions should be semantically equivalent, including in their side-effects: violating this rule will likely lead to undefined program behavior.
There is much more to say about this topic (including an alternate/simplified syntax for linear and side-effect-free instruction sequences) but that will have to wait: I’m currently working on a formal proposal to be sent to the llvmdev mailing list. I’ll post a followup as soon as that has been submitted (and hopefully accepted).