Monday, June 3, 2013

What's going on with rom?

This post will talk about the open source Redis object mapper that I've released called rom. I will talk about what it is, why I wrote it, and what I'm planning on doing with it. I've posted several articles about Redis in the past, and you can buy my book, Redis in Action, now (hard copy will be available on/around June 10, 2013 - about a week from now) - enter the code dotd0601au at checkout for half off!

What is rom?


Rom is an "active record"-style object mapper intended as an interface between somewhat-intelligent Python objects and behavior, and data stored in Redis. Early versions (everything available now, and available for the coming few months) are purposefully simplified with respect to what is possible with Redis so that rom's capabilities can grow into what is necessary/desired, rather than trying to build functionality to support any/all possible use-cases up front.

An example use for storing users* can be seen below.

from hashlib import sha256
import os

import rom

def hash_pw(password, salt=None):
    salt = salt or os.urandom(16)
    hash = sha256(salt + password).digest()
    for i in xrange(32768):
        hash = sha256(hash + password).digest()
    return salt, hash

class User(rom.Model):
    name = rom.String(indexed=True)
    email = rom.String(required=True, unique=True, indexed=True)
    salt = rom.String()
    hash = rom.String()

    def update_password(self, password):
        self.salt, self.hash = hash_pw(password)

    def check_password(self, password):
        hash = hash_pw(password, self.salt)[1]
        pairs = zip(map(ord, hash), map(ord, self.hash or ''))
        p1 = sum(x ^ y for x, y in pairs)
        p2 = (len(hash) ^ len(self.hash))
        if p1 | p2:
            raise Exception("passwords don't match")

    @property
    def contact(self):
        return '"%s" <%s>'%(self.name, self.email)

Aside from the coding overhead of handling the hashing of passwords in a somewhat secure manner, setting up attributes and adding simple behaviors on top of models is pretty much what you should expect in an object mapper used in Python.

Why did I write rom?


As is the case with many things I build, it was meant to scratch an itch. I was working on an as-of-yet unreleased personal project and I needed a database. This database would only ever need to hold a few megabytes of data (maybe up to the tens of megabytes), but it also may need to perform several thousand reads/second and several hundred writes/second on an under-powered machine, and would need to persist any updated data. Those requirements eliminated a standard database in a typical configuration. However, a relational database instructed to store data in-memory would work, except for the persisting to disk part (and Postgres with some fsync tuning wouldn't be unreasonable, except maybe for the read volume). Given that I have a lot of experience with Redis, and my requirements fit with many of the typical use-cases for Redis, my project seemed to be begging to use Redis.

But after deciding to use Redis for my project, then what? I've built ad-hoc data storage methods on Redis before (roughly 2 dozen different mechanisms that are or have run in production, never mind the dozens that I've advised others to use), and I feel bad every time I do it. So I started by looking through several of the object mappers for Redis in Python (some of which call themselves 'object Redis mappers' to stick with the 'orm' theme), but I didn't like the way they either exposed or hid the Redis internals. In particular, most of the time, users shouldn't care what some database is doing under the covers, and they shouldn't need to change the way they think about the world to get the job done.

Now, I know what you are thinking. You're thinking: "Josiah, Redis is a database that requires that you change the way you think about the world in order to make it work." Ahh, but that's where you are wrong. The purpose of rom is to abstract away about 90% of the strangeness of Redis to a new user (at least on the Python side). Almost everything works the way most users of SQLAlchemy, Django's ORM, or Appengine's datastore would expect. About the only thing that rom doesn't do that those other libraries offer on top of relational databases is: 1) composite indices and 2) ordered indices on string columns.

With Redis and the way rom handles its indices, there might be some advantages to offering composite indices on the performance side of things for certain queries. But those queries are very limited, and there is just under 64 bits of usable space in any index entry. Ordered indices on string columns is also tough, running into a limit of just under 64 bits to offer ordering there.

There is a method that would increase the limit beyond 64 bits, but that method would be incompatible with the other indices that rom already uses. So, long story short, don't expect composite indices, and don't expect ordered indices on strings that play nicely with other rom indices.

What is in rom's future?


If you've read everything above, you know that composite indices and sorted indices on strings that play nicely with the other indices is not going to happen. But what if you don't care about playing nicely with other indices? Well, that is a whole other story.

With ordered indices on strings, one really nice feature is that you can perform prefix lookups on strings - which makes autocomplete-like problems very easy. Expect that in the future.

At some point I'll also be switching to using Lua scripting to handle updating the data in Redis. That will offer fast and easy support for multiple unique index columns, while simultaneously offering point-in-time atomic updates without retries. All of the major logic would be on the Python side, leaving simple updates to be done by Lua. I haven't done it yet because the performance and feature advantages are not drastically better to necessitate it at this point. With a little work, it would even be possible to implement "check and update" behavior to ensure that data hasn't been manipulated by other clients.

I've also been thinking about deferring attribute validation until just before data is serialized and sent to Redis, as it can be a significant performance advantage. The only reason I didn't do that from the start is because a TypeError on commit() can be a nightmare. Hunting down the answer to, "how did that data get there?" some time after the write occurred can be an exercise in futility. By performing the validation on attribute write (and on data load from Redis), you can at least know when you are writing the wrong data, you will be notified of it right away. As such, deferring validation until commit() may be a documented but discouraged feature in the future.

Redis-native structure access


I know that by now, some of you are looking at your screen asking, "When can I get raw access to lists, sets, hashes, and sorted sets? Those are really what my application needs!" And my answer to that is: at some point down the line. I know, that is a wishy-washy answer. And it's wishy-washy because there are three ways of building the functionality: 1) copy data out of Redis, manipulate in Python, then write changes back on commit(), 2) create objects that manipulate the in-Redis data directly, or 3) offer smart objects like #2, but write data back on commit().

If we keep with the database-style, then #1 is the right answer, as it allows us to perform all writes at commit() time. But if people have large lists, sets, hashes, or sorted sets in Redis, then #1 is definitely not the right answer, as applications might be copying out a lot of information. But with #2, updates to other structures step outside the somewhat expected commit() behavior (this entity and all related data have been written). Really, the right answer is #3. Direct access to reading, but masking write operations until commit().

Talking about building native access is easy, but actually building it to be robust is no small task. For now, you're probably better suited writing manual commands against your structures if you need native structure access.

How can you help?


Try rom out. Use it. If you find bugs, report them at Github. If you have fixes for bugs, post the bug with a pull request. If you have a feature request, ask. If your feature request is in the range of things that are reasonable to do and which fit in with what I want rom to be, I'll build it (possibly with a delay). Tell your colleagues about it. And if you are feeling really generous, buy my book: Redis in Action. If you are feeling really generous (or lost), I also do phone consultations.

Thank you for taking the time to read, and I hope that rom helps you.

* Whether or not you should store users in rom and Redis is a question related to whether Redis is suitable for storing such data in your use-case. This is used as a sample scenario that most people that have developed an application with users should be able to recognize (though not necessarily use).

No comments:

Post a Comment