My first Firefox contribution

When I started ImageTweak a few years ago, the decision of building an extensions was mostly due to the fact that the Firefox codebase looked, frankly, quite scaring. Even only attempting to compile Firefox on Windows looked like an impossible endeavor, let alone building a modified version.

Compared to this, writing an extensions looked like a piece of cake (looking back, this was a rather naïve evaluation, but that’s another story…).

Fast forward a few years: ImageTweak has gotten pretty mature, so the question pops up again. Wouldn’t it be better to tackle the source of all problems (pun intended) and bring at least a subset of ImageTweak to all Firefox users? And so, on a fine spring day, while working on my MSc thesis, I finally found the courage to make a real attempt at producing a patch that would at least bring feature parity with ImageTweak 0.9, i.e. centering stand-alone images on a dark background.

After a few rounds of patches, I finally managed to produce a working patch that had a single, HUGE, issue. It produced an awful lot of failed tests in the regression tests. If you add to this that, by the time the working patch was ready, I was on the eve of my dissertation (that would be followed by a fair amount of AFK holidays and by a close-to-immediate hiring for a consultancy firm) the amount of time I could devote to Mozilla-related tasks dropped rather quickly to nil.

Luckily, my (clumsy) attempt at a patch had the effect of setting the proverbial ball rolling: somebody (Jared Wein) over at BMO picked up where I left off and fixed all regression tests. And, just like that, a couple of weeks ago, I received a notification about the fact that my patch had been pushed to mozilla-central and that it will be part of Firefox 11. Proudness ensued.

Obviously, as with any other user-facing change, this is prone to (hopefully constructive) discussion. And that’s good, I guess, because those are yet more proverbially rolling balls.

update: Unlike Daniel Glazman, Jared Wein seems to like it: Blinded by the light! – An improved image viewing experience in Firefox

Alternate instruction sequences in LLVM

I’ve been wondering how to express the fact that two instructions (or instruction sequences) are equivalent in LLVM IR.

The reason why this matters is that transforms operating at the IR level are supposed to be architecture-agnostic but this poses some problems: for example, consider a transform that takes a loop and rewrites it for better performance. There are multiple ways of expressing the same loop (think loop unrolling, loop switching, tiling, pipelining, etc.) but without knowledge of the target architecture we can only use some heuristics to choose which transforms to apply (e.g. should we unroll the loop 4 or 8 times? should we switch the inner and outer loop?). To make an optimal decision we would need to know the instruction timings and quite a few details about the cache/memory subsystem, but these are (at best) known only to the code generator in the backend.

To solve this problem a good idea could be having IR transforms generate multiple versions of the same code with different optimization choices applied and letting the backend choose the one that, on the target architecture, should perform better. These “multiple versions of the same code” are the “alternate instruction sequences” mentioned in the title of this post.

Unfortunately, LLVM does not provide an explicit mechanism to signal that a certain IR construct (instruction, basic block, function, etc.) is an alternate version of another construct, so we have to somewhat “bend” the meaning of certain existing instructions to express this concept.

Alternate regions

A region is defined as a group of basic blocks having exactly one incoming edge and at most one outgoing edge. Suppose that one transformation is able to generate N different versions of such region and that we want to insert them all in the IR to let the backend choose the best one. We can write:

switch undef, %alt1, [i32 2, %alt2], ..., [i32 N, %altN], !alt

The meaning of this switch instruction is exactly what we need: basically we’re saying “the program behavior is well-defined regardless of which of the following N basic blocks we jump to”, thereby allowing the backend to choose any one of them (hopefully the best-performing one).

Obviously we also need to insert the necessary phi instructions after the region if there are live-out values that were defined inside the region.

It’s most important to understand that all alternate versions should be semantically equivalent, including in their side-effects: violating this rule will likely lead to undefined program behavior.

There is much more to say about this topic (including an alternate/simplified syntax for linear and side-effect-free instruction sequences) but that will have to wait: I’m currently working on a formal proposal to be sent to the llvmdev mailing list. I’ll post a followup as soon as that has been submitted (and hopefully accepted).

Call-graph flattening for LLVM

I just published the source code of the pilot implementation of the call-graph flattening transform (CGF) for LLVM. This is the same implementation used in my thesis.

What CGF does is basically getting rid, during compilation, of the notion of “functions”. It does this by merging all the code together and by statically managing the stack during compilation. This means, ideally, getting the same benefits of inlining (no call overhead, callsite specialization) but without code duplication. Besides, many other optimizations are possible thanks to CGF, such as inter-procedural code motion. Moreover, since CGF is practically equivalent to continuation-passing style (CPS), there are interesting possibilities for efficient non-linear control flow semantics (exception handling, coroutines, etc.)

Currently the code is not ready for general usage because there are some cases (e.g. indirect recursion) that are not detected. Also, there are some parts of LLVM, such as the register spiller and the register allocator, that choke on CGF-transformed code because of the (potentially) huge number of PHI nodes and branches. For examples and a detailed account of open problems see chapter 4 of the thesis.

I’m planning to keep working on the implementation (and submitting LLVM bug reports) until the transformation is ready to be used on real-world code. If you have any suggestions or questions about this, don’t hesitate to get in touch with me.

The Sage

My dog Bush (yes, he’s named after the previous US president; no, that’s not a compliment).

thumbnail