A Dash of Technology

Tuesday, July 5, 2011

Port Forwarding: Just use SSH

This post is going to deviate a bit from what I normally post about. Normally, I talk about building software and systems. I include code snippets, rationale for why we made a particular design decision, etc. This post is going to be different.

A few nights ago, I decided it was time for to have my own personal server in the cloud. A remotely accessible server for personal projects. For the last couple years, I'd been building personal projects with the AppEngine SDK (using Python, obviously), but constantly running into limitations in terms of data store queries, transactions, types of data to cache, etc. Having a server just allows me to put the software in the cloud that I need. Things like Redis (of which I am fond), a standard SQL server (PostgreSQL is my preferred DB, but MySQL works very well too), or if I feel like dog-fooding and the kinds of queries that a non-relational store offers YogaTable is a great little tool.

This leads me to a 2 hour battle I had the other night, culminating in a face-palm moment. As some of you may or may not know, I use Windows as my desktop operating system. It's a personal choice that is the result of a lot of frustration with both OS X (beachballs of death, multi-minute system hangs, bad external monitor behavior, visually pretty but poor quality software, ...) and with Linux as a desktop (I shouldn't have to spend time troubleshooting a kernel update that disables my wireless card weeks after the kernel update). On the other hand, I love Linux as a server, especially when someone else has already configured system images that work perfectly in the cloud. This leaves me in a bit of a bind.

For my professional work at Ad.ly, I've actually got a physical machine sitting under my desk at our office running Linux (same version as we run in production). To access it, I have a local VM (which was originally my development environment) that I use as my ssh tunneler; either directly to it's IP when I'm at work, or to it's VPN IP when I'm at home (using the convenient aliases "work" and "home", respectively). It forwards an ssh port (for FreeNX typically), a development http port, and a Samba port. A few nights ago when trying to set up a secondary set of tunnels to the new machine, I decided it was time to reboot my local VM (there were a few security updates). After the reboot, it would seem that whatever incantation that made the Samba port forward work (which is on port 139), was no longer working.

Now, instead of needing one more set of port forwards, I now needed the original set again. After mucking about for 2 hours with netcat, inetd (which doesn't allow different servers on different IPs), iptables, and authbind, I realized that my issue (I needed to forward port 139 on a few different IP addresses to different destinations) could be easily handled with a root local forward:

$ sudo ssh -L 192.168.a.b:139:192.168.a.b:1139 \
           -L 192.168.a.c:139:192.168.a.c:1139 \
           ... localhost

Then I just need to forward the proper ports 1139 to the destination boxes, which is where the facepalm comes in. I'd struggled with three of those four methods when I originally set up the box (the new one is authbind, which didn't let me bind port 139 as non-root), and I keep forgetting that for "forward port X on host A to port Y on host B", even with A and B being the same host, SSH is generally the simplest path to success. Not netcat, not iptables, and not inetd. Lesson re-learned.

Thursday, June 9, 2011

Building an Ad Network Ready for Failure

A little over a year ago in May of 2010, I was tasked with the job of creating a self-serve location-targeted ad network for distributing ads to web, desktop, and mobile applications that typically displayed Twitter and Facebook status messages. This post will describe the architecture and systems that our team put together for ad network that ran 24/7 for over 6 months, serving up to 200 ad calls per second, targeting under 200ms response time, despite network hiccups and VPS failures, and how you can do the same with your architecture.

Our Rambo Architecture

Before we had started on our adventure to build an ad network, we had suffered hiccups with Amazon's EC2 systems. While most of the time the VPSes that we had created months prior had suffered no issues, occasionally we would have a VPS disappear from the network for some time, EBS volumes would hang on us, EBS volumes would be unable to be re-assigned to another VPS, VPS instances would be stuck in a state of being restarted, all of the things that people have come to expect from EC2. Given these intermittent failures, we understood that our entire system needed to be able to recover from a hang/failure of just about any set of machines. Some months after running the system, we read about Neflix' experience, their Rambo Architecture, and their use of Chaos Monkey. We never bothered to implement a Chaos Monkey.

Load Balancing

Initially, we had used Apache as a load balancer behind an Amazon elastic IP. Before switching live, we decided to go with HAProxy due to it's far better performance characteristics, it's ability to detect slow/failing hosts, and it's ability to send traffic to different hosts based on weights (useful for testing a new configuration, pushing more to beefier hosts, etc.). The *only* unfortunate thing about HAProxy was that when you reloaded the configuration (to change host weights, etc.), all counters were reset. Being able to pass SIGHUP to refresh the configuration would have been nice. Other than that minor nit, HAProxy was phenominal.

Worker Machines

After running through our expected memory use on each of our worker machines (I will describe their configurations in a bit), we determined that we could get by with a base of 750 megs + 300 megs/processor for main memory to max out the throughput on any EC2 instance. We were later able to reduce this to 450 + 300/processor, but either way, our ad serving scaled (effectively) linearly as a function of processor speed and number. In EC2, this meant that we could stick with Amazon's Hi-CPU Medium VPS instances, which offered the best bang for the $ in terms of performance. We had considered the beefier processors in Hi-Memory instances, but we had literally zero use for the extra memory. If our service took off, we had planned on moving to fewer High CPU Extra Large instances, but for the short term, we stuck with the flexibility and experimentation opportunities that Medium instances offered (thanks to HAProxy's ability to send different traffic volumes to different hosts).

The actual software stack running on each box was relatively slim: Redis (ad targeting and db cache), ActiveMQ (task queues), Apache + mod_wsgi + Python (to serve the ads and interact with everything), and a custom Geocode server written in Python (for handling ip -> location, lat,lon -> location, city/state/country -> location).

Software Stack

The geocoding server was the biggest memory user of them all, starting out at 600 megs when we first pushed everything out. Subsequent refinements on how data was stored dropped that by 300 megs. This process was, in my opinion, the most interesting of the technology that we developed for the ad targeting system, as it was able to take a latitude,longitude pair and determine what country, what state, and what zipcode that location was in (as applicable) in under 1 ms, yet do so in 300 megs of memory using pure Python. It had a simple threaded HTTP server interface. Each VPS ran an instance to minimize round-trip time and to minimize dependencies on other machines.

We had originally run with RabbitMQ, but after running for a week or two, ran into the same crashing bug that Reddit ran into. Our switch to ActiveMQ took a little work (ActiveMQ wasn't disconnecting some of it's sockets, forcing us to write a Twisted server to act as a broker to minimize connections, but one of our engineers sent a patch upstream to fix ActiveMQ), but we've been pretty happy with it. In addition to the queue, we also had a task processor written in Python that processed impressions and clicks, updating the database and the ad index as necessary if/when an ad ran out of money.

Our Apache + mod_wsgi + Python server mostly acted as a broker between encoding/decoding requests/reponses, ad calls to Redis, cache pulls from Redis, geocoding requests, and logging results. This is where we scaled/processor, and the memory use was primarily the result of running so many threads to maximize IO between our servers and Redis. For a brief time, we were also performing content analysis for further targeting (simple bayesian categorization across 25 categories, matched against pre-categorized ads, calculated in Python). This was consistently the slowest part of our calls, amounting for ~40ms, which we later dropped to 5ms with a little bit of numpy. Sadly, we found that content analysis was less effective for the bottom line than just paying attention to a calculated CPM on individual ads (calculated via CPC * CTR), so we tossed it for the sake of simplicity.

The real workhorse of our ad targeting platform was Redis. Each box slaved from a master Redis, and on failure of the master (which happened once), a couple "slaveof" calls got us back on track after the creation of a new master. A combination of set unions/intersections with algorithmically updated targeting parameters (this is where experimentation in our setup was useful) gave us a 1 round-trip ad targeting call for arbitrary targeting parameters. The 1 round-trip thing may not seem important, but our internal latency was dominated by network round-trips in EC2. The targeting was similar in concept to the search engine example I described last year, but had quite a bit more thought regarding ad targeting. It relied on the fact that you can write to Redis slaves without affecting the master or other slaves. Cute and effective. On the Python side of things, I optimized the redis-py client we were using for a 2-3x speedup in network IO for the ad targeting results.

The master Redis was merely Redis that was populated from the database by a simple script, and kept up-to-date by the various distributed tasks processes, which syndicated off to the slaves automatically.

After Responding to a Request

After an ad call was completed, an impression tracking pixel was requested, or a click occurred, we would throw a message into our task queue to report on what happened. No database connection was ever established from the web server, and the web server only ever made requests to local resources. The task server would connect to the database, update rows there, handle logging (initially to the database, then briefly to mongodb, and finally to syslog-ng, which we use today for everything), and update the Redis master as necessary. In the case of database or Redis master failure, the task queue would merely stop processing the tasks, batching them up for later.

The Weak Points

Looking at this setup, any individual VPS could go down except for the load balancer and all other parts would continue to work. Early on we had experimented with Amazons load balancer, but found out that it wouldn't point to an elastic IP (I can't remember why this was important at the time), so we used a VPS with HAProxy. Thankfully, the load balancer VPS never went down, and we had a hot spare ready to go with an elastic IP update.

Any worker box could go down, and it wouldn't effect serving except for a small set of requests being dropped. Our Redis master or master DB could even go down, and some ads may/may not be served when they should/shouldn't. We did lose the Redis master once, due to a network hiccup (which caused replication to one of the slaves to hang, blow up memory use on the master, and subsequently get the master killed by the Linux OOM killer). But this caused zero downtime, zero loss of $, and was useful in testing our emergency preparedness. We had some API workers go down on occasion due to EBS latency issues, but we always had 2-3x the number necessary to serve the load.

Our true weakness was our use of PostgreSQL 8.4 in a single-server setup. Our write load (even with API calls coming in at a high rate) was low enough to not place much of a load on our database, so we were never felt pressured to switch to something with more built-in options (PostgreSQL 9 came out about 3-4 months after we had started running the system). But really, for this particular data, using Amazon's RDS with multi-AZ would have been the right thing to do.

Where is it now?

After 6 months of development, getting application developers to start using our API, getting reasonable traffic, getting repeat ad spends, etc., we determined that the particular market segment we were going for was not large enough to validate continuing to run the network at the level of hardware and personnel support necessary for it to be on 24/7, so we decided to end-of-life the product.

I'll not lie, it was a little painful at first (coincidentally, a project in which I was the only developer, YouTube Groups, had been end-of-lifed a couple weeks before), but we all learned quite a bit about Amazon as a platform, and my team managed to build a great product.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

Friday, May 20, 2011

The Essentials Behind Building a Streaming API

The other day, the Redis mailing list was posed with a question: how would you build Twitter's streaming API using Redis? It doesn't make sense to stream everything to remote clients to filter there. And while you could do processing on your web server, that would likely severely affect the performance of every web request as every status message would be processed multiple times on every web process. Also, while you could build parts of it trivially with publish/subscribe, if any particular filter wasn't able to run fast enough to keep up, growing outgoing buffers can take down older versions of Redis. So what do we do? We don't use publish/subscribe.

Update: One comment from Reddit pointed out "I have no idea what is going on here without some context."

If you are a consumer of the Twitter Firehose, you are receiving a lot of data. You've probably got a lot of ideas of what to do with that data, and you're probably building your own APIs for filtering, processing, etc. If you are already used to using Twitter's API for streaming, filtering, etc., this gives you a way to build on top of what you already know.

One of the purposes of the Twitter Firehose is to allow 3rd parties to offer that data via other APIs. For example, Gnip offers a streaming API to re-package the data that they receive. Like any other provider ever, they use an alternative streaming API implementation so that anyone must rewrite their streaming code to use Gnip. Well, with the code below, they could actually offer exactly the Twitter Streaming API (they'd obviously have to build the proper web serving frontend), and paying/trial users could consume it without having to re-implement their technology.

Or, say that you want to offer a Twitter-like Streaming API in your Status.net installation (or clone), or use this to perform syndication between nodes in such a network. This code would get you most of the way there.

end update

We instead use zsets for timelines, lists for message queues, and simple values for the data to filter/syndicate out.

First thing's first, let's post a status message:

def got_status(conn, status, id=None):
    '''
    This will work until there are 2**53 ids generated, then we may get
    duplicate messages sent to the workers. There are some work-arounds, but
    they confuse the clean flow of the existing code.

    This function takes a Redis connection object, a status message, and an
    optional id. If the id is not None, the status message is assumed to be
    pre-dumped to json. If the id is None, the status will have a new id
    assigned to it, along with the current timestamp in seconds since the
    standard unix epoch.
    '''
    dumped = status
    if id is None:
        id = conn.incr(ID_KEY)
        status['id'] = id
        status['created_at'] = time.time()
        dumped = json.dumps(status)

    pipeline = conn.pipeline(True) # a pipeline returns itself
    pipeline.zadd(QUEUE, id, id)
    pipeline.set(STATUS_MESSAGE%(id,), dumped)
    pipeline.execute()
    return id

This is all pretty straightforward; we're going to generate an id if it doesn't exist, set some metadata on the status message, dump it to json, then add the id itself as both a member and a score to a zset, while also adding the status message data.

Next, assuming that we've got some sort of task queue implementation available, and we've got a method of creating a new Redis connection:

def spawn_worker_and_subscribe(which, content=None, backlog=0):
    '''
    This would be called by a web server to connect to some Redis server that
    is holding all of the status messages and data, yielding results as they
    become available.
    
    This function requires two utility functions be present:
    get_new_redis_connection(which):
        This will create or reuse a connection to some Redis server that is
        hosting the status messages.
    spawn_worker(...):
        This will spawn the worker() function above on some worker box
        somewhere, pushing matched status messages to the client via a list
        named sub:...
    '''
    conn = get_new_redis_connection(which)
    channel = 'sub:' + os.urandom(16).encode('hex')
    spawn_worker(conn.hostinfo, backlog, which, content, channel)
    
    while True:
        result = conn.blpop(channel, timeout=60)
        if result in (None, '<close>'):
            break
        yield result

Notice how we're not using publish/subscribe here? Using lists will allow us to pick some backlog on potentially a per-client basis in order to stop ourselves from using too much memory. In this case, the worker can tell us explicitly that it has closed, or if we don't receive anything for 60 seconds, we're going to assume that the worker has exited and this function should exit too.

Finally, let's get to the code that does all of the work (and nasty bits of handling slow outgoing clients). This code would be started up for every call of the spawn_worker_and_subscribe() function above on some task queue worker box:

def worker(hostinfo, backlog, which, content, subscriber):
    '''
    This worker handles the scanning of status message content against the
    user-requested filters.
    '''
    criteria = None
    if which == 'track':
        # should be a comma separated list of word strings
        # Given: 'streamapi,streaming api'
        # The first will match any status with 'streamapi' as an individual
        # word. The second will match any status with 'streaming' and 'api'
        # both in the status as individual words.
        criteria = TrackCriteria(content)
    elif which == 'follow':
        # should be a list of @names without the @
        criteria = FollowCriteria(content)
    elif which == 'location':
        # should be a list of boxes: [{'minlat':..., 'maxlat':..., ...}, ...]
        criteria = LocationCriteria(content)
    elif which == 'firehose':
        criteria = lambda status: True
    elif which == 'gardenhose':
        criteria = lambda status: not random.randrange(10)
    elif which == 'spritzer':
        criteria = lambda status: not random.randrange(100)
    elif which == 'links':
        criteria = lambda status: status['has_link']

    conn = get_new_redis_connection(hostinfo)
    if criteria is None:
        conn.rpush(subscriber, '&ltclose>')
        return

    # set up the backlog stuff
    end = 'inf'
    now = int(conn.get(ID_KEY) or 0)
    if backlog < 0:
        end = now
    now -= abs(backlog)

    pipeline = conn.pipeline(False)
    sent = 1
    tossed = tossed_notified = 0
    keepalive = time.time() + KEEPALIVE_TIMEOUT
    last_sent_keepalive = False
    # In Python 2.x, all ints/longs compare smaller than strings.
    while sent and now < end:
        found_match = False
        # get the next set of messages to check
        ids = conn.zrangebyscore(QUEUE, now, end, start=0, num=CHUNKSIZE)
        if ids:
            # actually pull the data
            for id in ids:
                pipeline.get(STATUS_MESSAGE%(id,))
            pipeline.llen(subscriber)
            statuses = pipeline.execute()
            outgoing_backlog = statuses.pop()

            for data in statuses:
                if not data:
                    # We weren't fast enough, someone implemented delete and
                    # the message is gone, etc.
                    continue
                result = json.loads(data)
                # check the criteria
                if criteria(result):
                    if outgoing_backlog >= MAX_OUTGOING_BACKLOG:
                        tossed += 1
                        continue
                    # send the result to the subscriber
                    last_sent_keepalive = False
                    if tossed_notified != tossed:
                        pipeline.rpush(subscriber, json.dumps({"limit":{which:tossed}}))
                        tossed_notified = tossed
                    outgoing_backlog += 1
                    found_match = True
                    pipeline.rpush(subscriber, data)

            if found_match:
                keepalive = time.time() + KEEPALIVE_TIMEOUT
                sent = any(pipeline.execute())
            # update the current position in the zset
            now = int(ids[-1])

        elif end == 'inf':
            time.sleep(NO_MESSAGES_WAIT)

        else:
            # we have exhausted the backlog stream
            break

        curtime = time.time()
        if not found_match and curtime > keepalive:
            keepalive = curtime + KEEPALIVE_TIMEOUT
            should_quit = last_sent_keepalive and conn.llen(subscriber)
            should_quit = should_quit or conn.rpush(subscriber, '{}') >= MAX_OUTGOING_BACKLOG
            if should_quit:
                # Can't keep up even though it's been 30 seconds since we saw
                # a match. We'll kill the queue here, and the client will time
                # out on a blpop() call if it retries.
                conn.delete(subscriber)
                break
            last_sent_keepalive = True

This function is a bit more involved. First we set up the proper filter criteria functions/classes. Then we set our start/end conditions given the user-specified backlog of status messages to check. We then pull status messages chunk by chunk, checking for matches, and putting matches into the outgoing queues. If the outgoing queue grows too large, we stop sending messages. If we don't have any matches for 30 seconds, we send a keepalive. If we were going to send a second keepalive, and the client hasn't seen the first keepalive, or if the queue is full, we consider that a client disconnect, and bail.

There are two other utility functions, some globals, and the criteria filters that are omitted here. You can see them at this Github Gist.

One incidental benefit of using the zset to store the timeline instead of using publish/subscribe to send all status messages to all workers, is that you get "backlog" processing for free.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

Tuesday, February 15, 2011

Some Redis Use-cases

About 6 months ago, the organizer of the LA NoSQL meetup was looking for presenters. Since my coworkers and I had been using Redis fairly heavily for a few months, I offered to do a presentation on Redis. Sadly, that presentation never happened, as the event was delayed and then cancelled for one reason or another. Because I've had the slides, and because I think the information is still useful, I thought I would reformat and rewrite it as a blog post.

What is Redis? From redis.io:

Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.

For those of you who are familiar with memcached, it's sort of like that, but more. That is, while you can only store strings/integers with memcached, you can store a few pre-defined data structures with Redis, which are optionally persisted to disk.

Let's start with the basic strings/integers. You can use strings as a serialized cached database row, maybe session storage, a resettable counter (incr with getset), or using setnx for distributed locks. Really, anything that you might have used memcached for, you can use Redis for.

Many people have found Redis lists to be useful as a simple fifo work queue (with the ability to insert/pop from either end, move items from one list to another atomically, limit list length, etc.). Lists can also be the source (and are always the result of when using the STORE option) of a sort call, which by itself can simply be the input keys, or even automatically pull results from string keys or hashes.

Simple 0/1 queue:

def add_work(item):
    rconn.lpush('work-queue', item)

def get_work():
    return rconn.rpop('work-queue')

def retry(item):
    rconn.rpush('work-queue', item)

There is also the set datatype, which has all of the common union, intersection, and difference operations available across set keys. Common use-cases include de-duplicating items for work queues, keeping 'classes' of items (rather than keeping a user:id:sex -> m, you can use user:sex:m -> {id1, id2, ...}), or even as a set of documents for a Redis-backed search engine.

More complex 1+ queue:

def add_work(item):
    rconn.lpush('work-queue', item)

def get_work():
    return rconn.rpoplpush('work-queue', 'in-progress')

def done(item)
    rconn.sadd('done', item)

def retry_failed():
    while rconn.llen('in-progress'):
        check = rconn.lindex('in-progress', -1)
        if not rconn.sismember('done', check):
            rconn.rpoplpush('in-progress', 'work-queue')
        else:
            rconn.srem('done', rconn.rpop('in-progress'))

Another very useful datatype in Redis is the hash. In Redis, hashes are string keys to string or integer values. Useful as a way of gathering similar kinds of data together, a hash can store a row from a database table with each column an entry, which allows for sorting and retrieval via sort and the various hash access methods. Add in the ability to increment/decrement columns in hashes, pull full hashes, etc., the existence of an object/model mapper, and Redis can be easily replace many uses of a traditional SQL database. Throw in Jak Sprat's Alchemy Database, which adds a SQL layer with Lua scripting inside Redis, and for small data sets, a Redis solution may be all you need.

Ad-hoc data sorts:

def insert(data):
    rconn.sadd('known-ids', data['id'])
    rconn.hmset('data:%s'%(data['id'],), data)

def sort_fetch(column, desc=True, num=10):
    results = rconn.sort('known-ids', start=0, num=num, desc=desc, by='data:*->%s'%(column,))
    p = rconn.pipeline(False)
    map(p.hgetall, ['data:'+id for id in results])
    return p.execute()

For those use-cases where having a sortable score over unique items is useful, Redis has the zset or sorted set data type, where each member in the set also has an associated float/double score, which produces an ordering over all keys in the sorted set, and which you can query by member, score, or rank. Some common use cases include priority queues, tag clouds, timeouts, rate limiting, and Redis-backed scored search engine.

Rate limiting:

def can_use(key, count, limit, timeout):
    if rconn.zrank('reset', key) == None:
      pipe.zadd('reset', key, time.time() + timeout)
    pipe.hincrby('counts', key, count)
    return pipe.execute()[-1] <= limit

def reset_counters():
    default = ((None, None),)
    key, ts = (rconn.zrange('reset', 0, 0, withscores=True) or default)[0]
    while ts is not None and ts <= time.time():
      pipe.zrem('reset', ts)
      pipe.hdel('counts', ts)
      pipe.zrange('reset', 0, 0, withscores=True) 
      key, ts = (pipe.execute()[-1] or default)[0]

Redis does support key expiration, but in pre-Redis 2.2 versions, key expiration can have confusing behavior. Use the following to manually expire keys... Manual key expiration:

def set_expire(key, timeout):
    rconn.zadd('expire', key, time.time()+timeout)

def expire_keys():
    p = rconn.pipeline(True)
    for key in rconn.zrangebyscore('expire', 0, time.time()-1):
        p.delete(key)
        p.zrem('expire', key)
    p.execute()

With these simple ideas and structures, even more complex behavior can be defined. Things like per-user prioritized queues, counting semaphores (for limiting worker counts in this case), per-page/site recent viewer lists, finding jobs that a user has the skills to perform (aka the pizza topping problem), navigation trees, and many more.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

Wednesday, December 22, 2010

Python as an Alternative to Lisp or Java, Peter Norvig revisited

Because I'm a fan of Peter Norvig, and because I've recently been going through some of his articles about Lisp, I ran into an old entry of his from 1999 about Lisp and Java. In it, he links off to a study where a sample problem was provided in order to compare the efficiency of C++/Java programmers. Feel free to read his post Lisp as an Alternative to Java. How did Peter do 11 years ago?

I did not participate in the study, but after I saw it, I wrote my version in Lisp. It took me about 2 hours (compared to a range of 2 to 8.5 hours for the other Lisp programmers in the study, 3 to 25 for C/C++ and 4 to 63 for Java) and I ended up with 45 non-comment non-blank lines (compared with a range of 51 to 182 for Lisp, and 107 to 614 for the other languages). (That means that some Java programmer was spending 13 lines and 84 minutes to provide the functionality of each line of my Lisp program.)

While relaxing a bit while testing was going on for our code push today, I followed the same instructions, only in Python. I hadn't read his lisp code before (and reading it afterwards, there is a lot of Common-Lisp-isms that I don't really understand), nor had I read the specific problem before (though I solved a similar problem in the spring of 1999 during a local programming competition in undergrad in C++).

My first correct solution was done in an hour with 53 non-comment lines, but I could trim it to 37 lines if I was okay collapsing all lines that could be collapsed.

Looking around a bit while counting lines, I realized that if I added a utility function, I could remove some confusing stuff, while at the same time reducing line count. Spending another 10 minutes got me to 47 lines without comments or spaces, and 44 if I collapsed all lines that could be collapsed while limiting it to 78 columns wide.

My solution is available at this github gist. I didn't write this or this blog post to be "look at how good Python is", because obviously one's ability to program solutions to problems/puzzles/etc., is fundamentally related to your experience and ability to think in a given language. And, on the most part, I've been living and breathing Python for the past 10+ years, with the last 6 years programming 4-5 days a week in Python both professionally and personally. That said, I do think that similar conclusions can be drawn from this bit of the experiment as Peter made, primarily that Python is very effective, very expressive, and can cut out a lot of the bullshit that most Java programmers deal with on a regular basis.

UPDATE:
A commenter pointed out that I had a bug that was exposed running over the large input files and comparing with the large output file. I fell into the same ambiguity as was pointed out in the paper as the "hint" on page 12.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

Friday, October 29, 2010

Being a student isn't easy, it requires actual work

Wandering about the internet this morning prior to doing some actual work, I happened upon a blog post by Seth Goodin about teaching, and how students should demand better instruction. Historically, I've generally agreed with and enjoyed Seth's blog (though lamenting being unable to comment there directly), but in this case, I think he's missing the point.

I can certainly appreciate the plight of college students everywhere, spending tens of thousands of dollars to go to school. I did the same thing for my college years, and even continued for another 5 1/2 years to go to grad school. Almost three years out of it, and I've still got loans, and probably will for a few years to come (9 1/2 years of postsecondary education isn't cheap in the states). However, I had the pleasure of being taught by amazing professors at both institutions that I attended, but even more importantly, attended school with interested and engaged students (I wasn't the only one doing homework on Friday nights). Sadly, this isn't always the case...

Everyone agrees that if you have a poor teacher/professor, your learning (and grades) will suffer... but there's a limit to which that is the instructor's fault. So often when I was both studying and teaching, I would hear complaints (and offered a few myself) about a poor instructor. Either they didn't care, didn't understand where their teaching fell on deaf ears, taught something unrelated to the course, ... However, when confronted with this type of instructor, a student is given an opportunity to engage themselves in learning. Classes come with books, and instructors are meant to help the student understand and integrate the knowledge and wisdom within those books. But prior to the internet, Wikipedia, or Khan Academy, students have managed to learn, despite poor instructors. How? They read and studied the books, consulting their fellow and elder students when they had questions. I know I was different in this regard, as when I found difficulty understanding my teacher during Trigonometry in high school, I read the book, studied, and understood it. When asked by other students how I managed to do well despite a confusing teacher, I pointed at the book. Only a few of them had taken the time to read the book beyond the problems, or when they did, would take the time to understand it.

Back when I was a TA in grad school, I made many mistakes (mostly in my first couple quarters). But by the final quarter of my teaching stint, I was doing 3 back-to-back sections for the same course, an hour each. The students who showed up and let even just a little bit of my enthusiasm rub off on them were engaged (if you are not excited about what you are teaching, students won't care, and students won't come). But what happened was that 5-10% of students never showed up for any of the discussion sections (except for reviews prior to exams). They would sometimes go to class (sometimes watching television or DVDs in the back rows), read their classmate's notes, hand in half-copied, half-bullshit homework, and expect to learn enough in one hour to be sufficient for an Algorithms/Data Structures midterm/final. Sometimes they managed to cram enough, but usually we would be overly generous and give them a D-.

I can appreciate what Seth is trying to say: expect more from your teachers/professors/instructors. But an amazing instructor can only go so far. Students must also be engaged and willing to participate in the process of learning, otherwise they are at least as much to blame for wasting their time and money as a poor instructor.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

Sunday, October 10, 2010

YogaTable as a Database Server

As promised in my last update, YogaTable is no longer an embedded database. Included in the source is a new server component, which listens for requests on a configurable host and port, defaulting to localhost:8765 .

I have included a client for Python, which has everything necessary for basic and advanced YogaTable use. The protocol is basically JSON over HTTP GET/POST, which makes it straightforward for interacting with using just about any language. I am in the process of documenting what is necessary to write new clients, and will be writing a client for Javascript, as well as a more advanced Python client library. Some simple benchmarks with Apache Bench tell me that YogaTable can perform 60 single inserts/second, and around 2500 bulk inserts/second, but that's in mostly ideal conditions.

One of the features that I am most excited about is being able to script the modification of multiple rows in the database with Lisp. I've taken a merged version of Peter Norvig's lis.py and lispy.py, improved the performance, removed some unnecessary features (some of which were unnecessary for database updates), added some other features, and ... Well, let's just see what it looks like. The following is an example from the YogaTable's tests. It shows how you can transactionally update two rows in the database at the same time, and more specifically, how one could implement transferring money from one account to another.

First, let's set up our rows in the database.

d1 = {'value':decimal.Decimal('200.00')}
d2 = {'value':decimal.Decimal('0.00')}
ids = zip(*self.table.insert([d1, d2]))[0]
d1['_id'] = ids[0]
d2['_id'] = ids[1]

Now, let's set up our shared data, and prepare for the output of our test.

shared = {'transfer':decimal.Decimal('45.23')}
d1['value'] -= shared['transfer']
d2['value'] += shared['transfer']

Let's actually perform the conditional update...

out = self.table.update([
    {'_id':ids[0],
     '__ops':'''
        (load types)
        (define zero (decimal `0.00))
        (define balance (getv `doc `value zero))
        (define transfer (getv `shared `transfer zero))
        (if (>= balance transfer)
            (begin
                (setv `doc `value (- balance transfer))
                (setv `shared `transferred #t)))
        '''},
    {'_id':ids[1],
     '__ops':'''
        (load types)
        (define zero (decimal `0.00))
        (define balance (getv `doc `value zero))
        (define transfer (getv `shared `transfer zero))
        (if (getv `shared `transferred #f)
            (setv `doc `value (+ balance transfer)))
        (delv `shared `transferred)
        (delv `shared `transfer)
        '''}], shared=shared)

The Lisp in here may look a little strange, as some of it is nonstandard. The first few lines of the operations for the rows loads the 'types' module, which offers access to the Python decimal.Decimal datatype (among others), pulls some balance information, and determines how much money is supposed to be transfered. The last few lines in the first operation verifies that there is enough money in the account, then deducts the money, and sets the shared variable 'transferred' to True.

The second operation checks to see if 'transferred' is True, and if so, adds the transferred balance to the second row. The two 'delv' lines in the second operation are merely there to remove the known shared variables so that if someone were to accidentally include a third row, then it wouldn't have access to this data.

And that's it. Money transfers in YogaTable. No need for 2-stage commits.

At this point, you are probably wondering where YogaTable is going as a piece of software. When Google first released AppEngine, one of the things that I was most intrigued by was it's Datastore. Some features I'd never seen before (indexes on all of the values in a list, in particular), and I wished that it was available outside of AppEngine. I'd been meaning to write an AppEngine Datastore-like backend for a long time, and some early versions of YogaTable were actually meant to allow for people to take the Google AppEngine SDK and plug my backend into it. It was meant as a way of scaling the SDK beyond trivial applications, and really, to allow for the full set of features and functionality offered by Appengine's Datastore to people who didn't want to run in Google's datacenters. That is not where YogaTable is going.

After having used MongoDB in production, I realized that the current software offerings for databases was missing something. Something that wasn't tied down to schemas like classic relational databases. Something that wasn't limited if you happened to *only* have a 32 bit machine. Something that could offer enough power for building a moderately-used web site (one million hits/day), but was flexible enough to not get in your way while you were developing it.

And thus, YogaTable was born. Aside from the design requirement of never performing table scans, and it's current lack of built-in replication/clustering, YogaTable today offers sufficient features to get almost any idea from concept to a million hits/day. And with the introduction of a Lisp interpreter, YogaTable is able to offer functionality that is otherwise very difficult in other systems (the simple multi-row update shown above requires a tricky 2-stage commit using AppEngine's Datastore).

There is still work to be done on YogaTable. Mostly, I need to document everything. From there, next steps include replication, clients in a few different languages, support for read-only replicas, automatic master/slave failover, clustering... But all in good time. Documentation first, features next.

I hope everyone stays interested, I know that I'm having fun.