Thursday, July 14, 2011

A Neat Python Hack? No, Broken Code

A few months ago I saw an anti-pattern that was offered by a much-loved member of the Python community. He offered it as a neat little hack, and if it worked, it would be awesome. Sadly, it does not work. The anti-pattern is available as a recipe on the Python Cookbook here, which I alter to make fully incorrect, later switch back to the original form to show that it also doesn't work, and explain why it can't work in the general case.

# known incorrect lightweight lock / critical section
ci = sys.getcheckinterval()
sys.setcheckinterval(0)
try:
    # do stuff
finally:
    sys.setcheckinterval(ci)

A cursory reading of the interactive Python help below would suggest that the inside the try block, no other Python code should be able to execute, thus offering a lightweight way of making race conditions impossible.
Help on built-in function setcheckinterval in module sys:

setcheckinterval(...)
    setcheckinterval(n)
    
    Tell the Python interpreter to check for asynchronous events every
    n instructions.  This also affects how often thread switches occur.
According to the basic docs, calling sys.setcheckinterval(0) might prevent the GIL from being released, and offer us a lightweight critical section. It doesn't. Running the following code...

import sys
import threading

counter = 0

def foo():
    global counter
    ci = sys.getcheckinterval()
    sys.setcheckinterval(0)
    try:
        for i in xrange(10000):
            counter += 1
    finally:
        sys.setcheckinterval(ci)

threads = [threading.Thread(target=foo) for i in xrange(10)]
for i in threads:
    i.setDaemon(1)
    i.start()

while threads:
    threads.pop().join()

assert counter == 100000, counter
I get AssertionErrors raised with values typically in the range of 30k-60k. Why? In the case of ints and other immutable objects, counter += 1 is really a shorthand form for counter = counter + 1. When threads switch, there is enough time to read the same counter value from multiple threads (40-70% of the time here). But why are there thread swaps in the first place? The full sys.setcheckinterval documentation tells the story:

sys.setcheckinterval(interval)
Set the interpreter’s "check interval". This integer value determines how often the interpreter checks for periodic things such as thread switches and signal handlers. The default is 100, meaning the check is performed every 100 Python virtual instructions. Setting it to a larger value may increase performance for programs using threads. Setting it to a value <= 0 checks every virtual instruction, maximizing responsiveness as well as overhead.

Okay. That doesn't work because our assumptions about sys.setcheckinterval() on boundary conditions were wrong. But maybe we can set the check interval to be high enough so the threads never swap?

import os
import sys
import threading

counter = 0
each = 10000
data = os.urandom(64).encode('zlib')
oci = sys.getcheckinterval()

def foo():
    global counter
    ci = sys.getcheckinterval()
    # using sys.maxint fails on some 64 bit versions
    sys.setcheckinterval(2**31-1)
    try:
        for i in xrange(each):
            counter += (1, data.decode('zlib'))[0]
    finally:
        sys.setcheckinterval(ci)

threads = [threading.Thread(target=foo) for i in xrange(10)]
for i in threads:
    i.setDaemon(1)
    i.start()

while threads:
    threads.pop().join()

assert counter == 10 * each and oci == sys.getcheckinterval(), \
    (counter, 10*each, sys.getcheckinterval())

When running this, I get assertion errors showing the counter typically in the range 10050 to 10060, and showing that the check interval is still 2**31-1. That means that our new counter += (1, data.decode('zlib'))[0] line, which can be expanded to counter = counter + (1, data.decode('zlib'))[0] reads the same counter value almost 90% of the time. There are more thread swaps with the huge check interval than with a check interval that says it will check and swap at every opcode!

Obviously, the magic line is counter += (1, data.encode('zlib'))[0]. The built-in zlib libraries have an interesting feature; whenever it compresses or decompresses data, it releases the GIL. This is useful in the case of threaded programming, as it allows other threads to execute without possibly blocking for a very long time - even if the check interval has not passed (which we were trying to prevent with our sys.setcheckinterval() calls). If you are using zlib (via gzip, etc.), this can allow you to use multiple processors for compression/decompression, but it can also result in race conditions, as I've shown.

In the standard library, there are a handful of C extension modules that do similar "release the GIL to let other threads run", which include (but are not limited to) some of my favorites: bz2, select, socket, and time. Not all of the operations in those libraries will force a thread switch like zlib compression, but you also can't really rely on any module to not swap threads on you.

In Python 3.x, sys.getcheckinterval()/sys.setcheckinterval() have been deprecated in favor of sys.getswitchinterval()/sys.setswitchinterval(), the latter of which takes the desired number of seconds between switch times. Don't let this fool you, it has the exact same issues as the just discussed Python 2.3+ sys.getcheckinterval()/sys.setcheckinterval() calls.


What are some lessons we can learn here?
  1. Always check before using anyone's quick hacks.
  2. If you want to make your system more or less responsive, use sys.setcheckinterval().
  3. If you want to execute code atomically, use a real lock or semaphore available in the thread or threading modules.


If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

7 comments:

  1. The blog is to say "this pattern sucks, here is how I prove it" but if you augment it with "another blog that shows you a few correct ways to use real locks (using the "with" statement to ensure certain behaviors, alternatives like finally->release lock, mutex, etc etc), then it'll add more value to the blog.

    Also, in the beginning of the blog, if you could show how many cycles a real based lock uses vs. a lightweight sometimes-lock uses, then your reader may be more inclined to finish reading because it'll answer "why should I care about a lightweight lock, is it a 2X gain? 10X gain? 100X gain?"

    ReplyDelete
  2. Hmm, good one -- I saw this also from Raymond in a presento a couple months ago and took it at face value -- although I'd probably never use it in real code since it relies on what seems like awfully hacky behavior...

    ReplyDelete
  3. Just use Python 2.7.2 and all is well.

    ReplyDelete
  4. Anon: to what are you referring when you say "Just use Python 2.7.2 and all is well."? The pattern doesn't start working with 2.7.2 (or any of the 3.x series for that matter)...

    ReplyDelete
  5. I believe the problem isn't with the recipe (which expects setcheckinterval to take effect right away, as the documented API suggests) but rather a bug in Python itself which has a fix already loaded for the next release.

    Per Misc/NEWS for Python 2.7.3:
    """- sys.setcheckinterval() now updates the current ticker count as well as updating the check interval, so if the user decreases the check interval, the ticker doesn't have to wind down to zero from the old starting point before the new interval takes effect. And if the user increases the interval, it makes sure the new limit takes effect right away rather have an early task switch before recognizing the new interval."""

    Also, your post suggests that I recommend this recipe. If you recall the presentation, it was a list of fun, over-the-top hacks and were definitely not recommended for production use. Please remove my name from you post.

    -- Raymond Hettinger

    P.S. checkinterval hacks have been around for a very long time. IIRC, I first heard of them from Tim Peters ten years ago.

    ReplyDelete
  6. Raymond: the bug you are referring to is unrelated to the cause of why the counts aren't what they are supposed to be, or why the check interval finishes at 2 billion.

    The cause of both issues is because CPython releases the GIL regardless of the check interval when it is performing certain operations. It's not a bug, it's a concurrency feature of Python, which renders the hack as being incorrect.

    Nowhere in the post did I claim that you recommended the recipe, but that you "offered it as a neat little hack". I've removed your name and a link to your Twitter, as per your request.

    My intent was to dig into the guts of a Python recipe that happened to be incorrect and to explain why it was incorrect.

    I appreciate the personal advice you gave me years ago, never mind all of your contributions to Python and the community, and I'm sorry if I have offended you.

    ReplyDelete