Sunday, October 10, 2010

YogaTable as a Database Server

As promised in my last update, YogaTable is no longer an embedded database. Included in the source is a new server component, which listens for requests on a configurable host and port, defaulting to localhost:8765 .

I have included a client for Python, which has everything necessary for basic and advanced YogaTable use. The protocol is basically JSON over HTTP GET/POST, which makes it straightforward for interacting with using just about any language. I am in the process of documenting what is necessary to write new clients, and will be writing a client for Javascript, as well as a more advanced Python client library. Some simple benchmarks with Apache Bench tell me that YogaTable can perform 60 single inserts/second, and around 2500 bulk inserts/second, but that's in mostly ideal conditions.

One of the features that I am most excited about is being able to script the modification of multiple rows in the database with Lisp. I've taken a merged version of Peter Norvig's lis.py and lispy.py, improved the performance, removed some unnecessary features (some of which were unnecessary for database updates), added some other features, and ... Well, let's just see what it looks like. The following is an example from the YogaTable's tests. It shows how you can transactionally update two rows in the database at the same time, and more specifically, how one could implement transferring money from one account to another.

First, let's set up our rows in the database.
d1 = {'value':decimal.Decimal('200.00')}
d2 = {'value':decimal.Decimal('0.00')}
ids = zip(*self.table.insert([d1, d2]))[0]
d1['_id'] = ids[0]
d2['_id'] = ids[1]

Now, let's set up our shared data, and prepare for the output of our test.
shared = {'transfer':decimal.Decimal('45.23')}
d1['value'] -= shared['transfer']
d2['value'] += shared['transfer']

Let's actually perform the conditional update...
out = self.table.update([
    {'_id':ids[0],
     '__ops':'''
        (load types)
        (define zero (decimal `0.00))
        (define balance (getv `doc `value zero))
        (define transfer (getv `shared `transfer zero))
        (if (>= balance transfer)
            (begin
                (setv `doc `value (- balance transfer))
                (setv `shared `transferred #t)))
        '''},
    {'_id':ids[1],
     '__ops':'''
        (load types)
        (define zero (decimal `0.00))
        (define balance (getv `doc `value zero))
        (define transfer (getv `shared `transfer zero))
        (if (getv `shared `transferred #f)
            (setv `doc `value (+ balance transfer)))
        (delv `shared `transferred)
        (delv `shared `transfer)
        '''}], shared=shared)

The Lisp in here may look a little strange, as some of it is nonstandard. The first few lines of the operations for the rows loads the 'types' module, which offers access to the Python decimal.Decimal datatype (among others), pulls some balance information, and determines how much money is supposed to be transfered. The last few lines in the first operation verifies that there is enough money in the account, then deducts the money, and sets the shared variable 'transferred' to True.

The second operation checks to see if 'transferred' is True, and if so, adds the transferred balance to the second row. The two 'delv' lines in the second operation are merely there to remove the known shared variables so that if someone were to accidentally include a third row, then it wouldn't have access to this data.

And that's it. Money transfers in YogaTable. No need for 2-stage commits.


At this point, you are probably wondering where YogaTable is going as a piece of software. When Google first released AppEngine, one of the things that I was most intrigued by was it's Datastore. Some features I'd never seen before (indexes on all of the values in a list, in particular), and I wished that it was available outside of AppEngine. I'd been meaning to write an AppEngine Datastore-like backend for a long time, and some early versions of YogaTable were actually meant to allow for people to take the Google AppEngine SDK and plug my backend into it. It was meant as a way of scaling the SDK beyond trivial applications, and really, to allow for the full set of features and functionality offered by Appengine's Datastore to people who didn't want to run in Google's datacenters. That is not where YogaTable is going.

After having used MongoDB in production, I realized that the current software offerings for databases was missing something. Something that wasn't tied down to schemas like classic relational databases. Something that wasn't limited if you happened to *only* have a 32 bit machine. Something that could offer enough power for building a moderately-used web site (one million hits/day), but was flexible enough to not get in your way while you were developing it.

And thus, YogaTable was born. Aside from the design requirement of never performing table scans, and it's current lack of built-in replication/clustering, YogaTable today offers sufficient features to get almost any idea from concept to a million hits/day. And with the introduction of a Lisp interpreter, YogaTable is able to offer functionality that is otherwise very difficult in other systems (the simple multi-row update shown above requires a tricky 2-stage commit using AppEngine's Datastore).


There is still work to be done on YogaTable. Mostly, I need to document everything. From there, next steps include replication, clients in a few different languages, support for read-only replicas, automatic master/slave failover, clustering... But all in good time. Documentation first, features next.

I hope everyone stays interested, I know that I'm having fun.

2 comments:

  1. Doesn't seem very ACID compliant. What happens if your transfer fails (eg: computer crash) right after subtracting the transfer amount from account 1? Looks like the money goes into never-never land.

    ReplyDelete
  2. The underlying data storage engine is SQLite, which is fully ACID compliant. SQLite will write a journal row for the partial transaction at every step of the way, and all operations within a multi-row YogaTable operation are within a single transaction. Upon crash/restart, The underlying SQLite system rolls-back any partial operations.

    So yes, in this case, YogaTable is ACID compliant.

    ReplyDelete