Sunday, April 1, 2012

Python bytecode hacks, gotos revisited

Some 8 years ago today, two April fool's jokes appeared on python-list and its mirror newsgroup, comp.lang.python. At the time, I was too young or too much of an idiot to appreciate them.
The first was a pronouncement of Guido's having decided that instead of the tab vs. space wars, that we would use the unicode character u'\x08' and our editors/IDEs could be told how wide to display the character. Even better, almost all editors/IDEs already supported the functionality. Like the wet blanket that I was at the time, I of course pointed out that PEP 8, which defines "the standard" to be 4 space indents, no tabs, hadn't been updated and that no discussion of the kind had occurred in python-dev at any time over the weeks prior. In a nutshell, I was a jerk. Sadly, I can't find the original thread in Google Groups. I still feel bad about being the guy who rained on that parade. So, whomever wrote it, I'm sorry.

The second bit of awesome that I remember was Richie Hindle's Goto in Python. If you've not seen it, or you haven't tried it out, you are missing something. The miracle of it all is that it worked. The details are very cute, it hooks the system trace function, and mangles the next line number to be executed. Now, I don't believe that I said anything negative about it at the time, but I certainly wasn't terribly enthusiastic about it. Having recently been reminded of Richie's goto, it dawned on me how brilliant of an idea the joke was. While I'm sure someone may have thought it was a good idea in the early days of Python, and Guido would have rightfully rejected it. But that Richie came back some 12+ years after Python's creation with a working implementation that didn't alter the underlying syntax? Genius.

Getting to the point

Why am I writing this post? Surely it can't just be to pontificate on the history of Python. Oh no. It's time that I give something back for not being able to appreciate a joke all those years ago, and I'd like to make an effort to commemorate the 8th anniversary of Richie and the unknown 'tab indent' poster's brilliance. Recently, it has come to my attention that Carl Cerecke re-implemented goto in Python by hacking bytecodes (way back in November of 2009, I know, I'm a little late to the party). Why didn't I see it before? Why didn't Richie see it before? It's an evolutionary step in the right direction! It's starting with a non-practical but great joke, and taking it to it's logical conclusion: make it practical. Now, Carl didn't see it as being a joke, and reasonably so. And after seeing that it could be practical, I agree that it's no longer a joke (I will explain this later). But sadly, the recipe has bugs. My contribution is to get rid of the bugs, and add experimental features.

Gotos in Python

First, let's look at the syntax based on one of the examples listed as part of Carl's recipe.
def test1(n):

    s = 0

    label .myLoop

    if n <= 0:
        return s
    s += n
    n -= 1

    goto .myLoop

assert test1(10) == 55
You define a label using 'label.<labelname>', optionally adding a space between 'label' and the '.' to make it easier to read. Similarly, gotos are defined via 'goto.<labelname>'. Perfectly reasonable and valid Python syntax, just an attribute reference. The magic lies in the decorator up front, which translates those attribute references into bytecode that either turns a label into a sequence of no-ops, or a jump to the label. But, as I mentioned before, there are bugs.

The recipe as listed suffers from 3 major bugs, 2 of which are similar in cause. In the decorator written by Carl, he uses a dictionary as a mapping of goto labels to bytecode offsets where that label occurs. Because of this, you can't have two gotos leading to the same label. And when the translation occurs, it leaves all but the last goto leading to that label unchanged, so the more useful example below that uses gotos for error handling, wouldn't work.
def handler(data):
    if testA(data):
        if testB(data):
            if testC(data):
                if testD(data):
                    goto .error
            goto .error
    return "success"

    label .error
    # do something complicated that you don't want
    # to duplicate in every error condition
    return "error"
Only the last goto would have been translated into a jump, but the other one will sit around and not cause an error until it is hit, at which point you will get an AttributeError. The solution, of course, is to map gotos names to a list of bytecode offsets for that label. An easy fix.

Remember earlier when I mentioned that I no longer think that gotos in Python are a joke? Well, if you dig through the CPython source code, you will come to find that gotos are used in the C parts of CPython for error handling. In Python, you can implement much of the same control flow with loops and 'break/continue' statements, but it's legitimately less elegant than a goto. I recently found myself writing code for work where using gotos for error handling would have simplified our code significantly. Ultimately, I ended up not using gotos, only because our requirements changed and I could gut the function to remove all of the strange corner cases. But I now recognize how useful gotos can be.

The second bug is that if you happen to include the same label twice, the decorator won't warn you, and will again just use the last label as the destination for jumps, leaving earlier labels as AttributeError potholes. This is also an easy fix, we can just check to see if it the label has been seen during the bytecode manipulation already, and if so, raise an exception.

The last bug is caused by the earlier decorator creating a new function that didn't preserve existing attributes. To fix this bug, we'll bypass creating a new function by just replacing the code object on the function. This is, by the way, one of my favorite ways to monkey-patch libraries and running systems; you don't need to replace every reference to a function or method, you just need to replace it's code object. But I digress, let's add a feature.

Computed gotos

Richie's library also supported computed gotos, but Carl didn't want to include them because they are not Pythonic. I suspect a different reason, and that is because computed gotos are a huge pain in the ass. Python's underlying bytecode doesn't support jumping to an arbitrarily computed offset, and because Python's code objects are immutable, you can't even write self-modifying code without mucking about with the ctypes module. How frustrating ;) . But don't fret, because Richie has the answer: trace functions. Now, since this is a post about bytecode hacking, of course we've got to keep hacking on bytecodes. Also, in C/C++, computed gotos aren't really bare like Richie was using, no, they are actually switch/case statements. Ah hah! All we need is a syntax in Python that gives us enough room to do all of the shenanigans that we need to do for switch/case statements in Python. First, the syntax. See the following (very ugly) code listing for the syntax that we are going to use.
def test2():
    a = 1
    while switch *a:
        # not taken
        goto .end

        # taken

        # there is fallthrough like C/C++!

    # default code goes here

    label .end
    # this is after the switch/case
Now, don't let that syntax fool you, the while loop is just to set up a block using at least the 12 bytecodes we need for the hack. A with statement would have been sufficient, but it has a lot of setup and teardown that is annoying to replace (a total of 24 bytecodes), and the other non-looping block in Python, 'if', only offers 11 bytecodes. You are probably wondering why we didn't just re-use "continue" and "break" to jump back to the switch or jump out of the switch. The short reason is because I didn't want to have to rewrite the entire bytecode of the function to adjust offsets (because continue/break are 1 bytecode, and jumps are 3 bytecodes), and the longer reason is just that I was being lazy :P

Okay, so we've got a syntax, but how do we jump to arbitrary case statements? Well, we set up a helper function that will handle jumping to arbitrary line numbers, then we replace the while loop bytecode with alternate bytecode that fetches the destination line number from a dictionary that we packed away in the locals array (defaulting to a line just after the while loop, so the "default" is just outside the while block), and we call the helper to jump to the proper line number instead of looping.

The function that actually handles jumping to arbitrary line numbers will set up the system trace function to a necessary level to induce the line number change, then allow the trace to clean itself up. With this method, we can get switch/case statement handling to execute fairly quickly, aside from the trace function overhead.

The sad truth

I know what you are thinking, you're thinking, "Holy crap, switch/case in Python with a bytecode hack? We can totally use that to optimize some types of parsing!" Well, it's not that simple. Because of the stuff we did with the trace function, we actually can't even put the switch/case within a loop because it causes a segfault with the Python interpreter due to the way trace functions are manipulated. I have not dug too deeply into this, nor have I tried to fix it, primarily because I've been too busy. Heck, I'm writing this blog post on my birthday, because it's the first spare time I've had in weeks.

Even worse, because of the way that Python handles loops and with statements, jumping in and out of them can lead to some pretty nasty results, potentially leading to a segfault. Those issues are still there, though an intrepid hacker could continue down this path to add checks to the hack to detect loop starts/ends, and check how deep into blocks the goto/labels are. If someone doesn't beat me to it, maybe I'll get to it in a couple months, but I'm not going to make promises.

Now, with the sad truth comes a bit of sunlight to brighten your day: there is a way to fix the segfault. Rather than abusing the system trace function, we can instead set up a sequence of if-statements in a tree structure to find the correct case, in O(log(number of case statements)). This actually wouldn't be all that unreasonable, except that there isn't room inside the existing bytecode to make it happen, so we would need to add extra bytecode to the end of the existing bytecode, or inject the bytecode and adjust the offsets in the bytecode and line number table. Maybe I'll get around to it some day.


You can see, download, and fork gotos in Python at this Github gist. Keep making cool stuff, and don't let party poopers like me get you down.