Tuesday, February 15, 2011

Some Redis Use-cases

About 6 months ago, the organizer of the LA NoSQL meetup was looking for presenters. Since my coworkers and I had been using Redis fairly heavily for a few months, I offered to do a presentation on Redis. Sadly, that presentation never happened, as the event was delayed and then cancelled for one reason or another. Because I've had the slides, and because I think the information is still useful, I thought I would reformat and rewrite it as a blog post.

What is Redis? From redis.io:
Redis is an open source, advanced key-value store. It is often referred to as a data structure server since keys can contain strings, hashes, lists, sets and sorted sets.
For those of you who are familiar with memcached, it's sort of like that, but more. That is, while you can only store strings/integers with memcached, you can store a few pre-defined data structures with Redis, which are optionally persisted to disk.

Let's start with the basic strings/integers. You can use strings as a serialized cached database row, maybe session storage, a resettable counter (incr with getset), or using setnx for distributed locks. Really, anything that you might have used memcached for, you can use Redis for.

Many people have found Redis lists to be useful as a simple fifo work queue (with the ability to insert/pop from either end, move items from one list to another atomically, limit list length, etc.). Lists can also be the source (and are always the result of when using the STORE option) of a sort call, which by itself can simply be the input keys, or even automatically pull results from string keys or hashes.

Simple 0/1 queue:
def add_work(item):
    rconn.lpush('work-queue', item)

def get_work():
    return rconn.rpop('work-queue')

def retry(item):
    rconn.rpush('work-queue', item)

There is also the set datatype, which has all of the common union, intersection, and difference operations available across set keys. Common use-cases include de-duplicating items for work queues, keeping 'classes' of items (rather than keeping a user:id:sex -> m, you can use user:sex:m -> {id1, id2, ...}), or even as a set of documents for a Redis-backed search engine.

More complex 1+ queue:
def add_work(item):
    rconn.lpush('work-queue', item)

def get_work():
    return rconn.rpoplpush('work-queue', 'in-progress')

def done(item)
    rconn.sadd('done', item)

def retry_failed():
    while rconn.llen('in-progress'):
        check = rconn.lindex('in-progress', -1)
        if not rconn.sismember('done', check):
            rconn.rpoplpush('in-progress', 'work-queue')
        else:
            rconn.srem('done', rconn.rpop('in-progress'))

Another very useful datatype in Redis is the hash. In Redis, hashes are string keys to string or integer values. Useful as a way of gathering similar kinds of data together, a hash can store a row from a database table with each column an entry, which allows for sorting and retrieval via sort and the various hash access methods. Add in the ability to increment/decrement columns in hashes, pull full hashes, etc., the existence of an object/model mapper, and Redis can be easily replace many uses of a traditional SQL database. Throw in Jak Sprat's Alchemy Database, which adds a SQL layer with Lua scripting inside Redis, and for small data sets, a Redis solution may be all you need.

Ad-hoc data sorts:
def insert(data):
    rconn.sadd('known-ids', data['id'])
    rconn.hmset('data:%s'%(data['id'],), data)

def sort_fetch(column, desc=True, num=10):
    results = rconn.sort('known-ids', start=0, num=num, desc=desc, by='data:*->%s'%(column,))
    p = rconn.pipeline(False)
    map(p.hgetall, ['data:'+id for id in results])
    return p.execute()

For those use-cases where having a sortable score over unique items is useful, Redis has the zset or sorted set data type, where each member in the set also has an associated float/double score, which produces an ordering over all keys in the sorted set, and which you can query by member, score, or rank. Some common use cases include priority queues, tag clouds, timeouts, rate limiting, and Redis-backed scored search engine.

Rate limiting:
def can_use(key, count, limit, timeout):
    if rconn.zrank('reset', key) == None:
      pipe.zadd('reset', key, time.time() + timeout)
    pipe.hincrby('counts', key, count)
    return pipe.execute()[-1] <= limit

def reset_counters():
    default = ((None, None),)
    key, ts = (rconn.zrange('reset', 0, 0, withscores=True) or default)[0]
    while ts is not None and ts <= time.time():
      pipe.zrem('reset', ts)
      pipe.hdel('counts', ts)
      pipe.zrange('reset', 0, 0, withscores=True) 
      key, ts = (pipe.execute()[-1] or default)[0]
Redis does support key expiration, but in pre-Redis 2.2 versions, key expiration can have confusing behavior. Use the following to manually expire keys... Manual key expiration:
def set_expire(key, timeout):
    rconn.zadd('expire', key, time.time()+timeout)

def expire_keys():
    p = rconn.pipeline(True)
    for key in rconn.zrangebyscore('expire', 0, time.time()-1):
        p.delete(key)
        p.zrem('expire', key)
    p.execute()
With these simple ideas and structures, even more complex behavior can be defined. Things like per-user prioritized queues, counting semaphores (for limiting worker counts in this case), per-page/site recent viewer lists, finding jobs that a user has the skills to perform (aka the pizza topping problem), navigation trees, and many more.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

3 comments:

  1. Thank you for the great article. It's very useful!

    ReplyDelete
  2. I've been playing around with your rate limiting example functions and think I've finally got a decent understanding of how they work after porting the code to lua. I am wondering about the built in key expiration in redis 2.2.

    Is it possible to accomplish the same type of rate limiting by adding keys to a sorted set that expire at time.now + n with n being the length of the sliding window you want to use for the rate calculation then using a count of the number of keys / n to determine current rate. This way reset function wouldn't be needed. Being new to redis' capabilities this question has vexed me for some time now.

    thanks

    ReplyDelete
  3. Anonymous: You can definitely do that. It has many of the same limitations, though you are then restricted to have every limit be on the same time frame (your n). By using a secondary zset as the above example has, you can have different per second, minute, hour, day, etc., limits.

    For example, 60 requests per minute, but only 1800 requests per hour. That would allow a short-term burst rate for a minute that is twice as high as the average for the hour. That is possible with the 2-zset version, but isn't possible with a single zset with a Redis 2.2 key expiration.

    ReplyDelete