Tuesday, September 14, 2010

YogaTable Part 2: An Embedded NoSQL Database

It's been far too long since my previous post about YogaTable, but in the process of writing tests, testing, and cleaning up some of my original code, I had discovered a few bugs with how some searches were being performed. Throw in some weekend work on Binary Space Partitions, and you have a recipe for a delayed post.

This version introduces an "embedded" interface to YogaTable.  Similar to how SQLite operates, you specify where you would like your tables to be stored, and you receive a Database instance: >>> import embedded; db = embedded.Database(path).  That Database instance has implicitly-defined tables, which can be accessed via db.table_name.insert/ update/ delete/ search/ add_index/ drop_index/... .

Why an embedded database?  Well, for starters, it's a good stepping stone from a storage engine to a full database server.  Once we have an embedded database (especially one with a straightforward interface like YogaTable has), a database serving daemon is just a protocol definition away.  And if you implement your embedded database correctly (like thread safety, etc.), then many of the hard parts relating to multiple clients and routing responses are already solved.

In order to handle index building for new indexes on old data, or index deletion for deleted indexes, and to handle threaded users of the embedded database, we chose to push the processing for each table off into it's own process via Python's convenient multiprocessing library.  Commands are forwarded to each of these processors via multiprocessing.Queue instances (one per table), with all responses for all tables coming back via a single queue.  Threads that make requests against a table don't all wait on the same queue, but each waits on it's own standard Queue.  Responses are routed to the appropriate caller via a special routing thread, which also handles cleanup for threads that have exited.  You can see how routing, process startup, etc., happens in embedded.py.

By pushing all processing into secondary processes, we are also able to leverage multiple processors (with multiple tables or databases) and multiple disks (with multiple databases), which should hopefully reduce latency and increase throughput for heavy users.  We do gain some latency with processes thanks to a serialization and deserialization step for each direction, but the convenience of not needing to write a multi-table load-balancer cannot be understated.  Remember, part of this project's purpose is to build on top of and re-use known-good components whenever we can, and letting the OS handle table-level processor and disk scheduling is the right thing to do.  To see how query processing occurs and how we balance queries with index creation and deletion, check out lib/processor.py.

On the other hand, SQLite has well-known limitations with regards to multiple readers and/or writers.  As in: don't do it (without thinking really hard about it).  So we're not.  Each table processor is single threaded (aside from the effectively transparent communication threads), and has a query-driven event loop.  All requests are processed in-order as they come in to the processor.  When idle, the query processor picks up any outstanding index creation or deletion requests.  The next set of changes will include configuration options to allow for the balancing of request processing vs. indexing vs. cleanup.

If you aren't in the mood for yet another another embedded database, don't worry.  YogaTable is embedded-only just until my next post, whose update will include a RESTful interface for easy multi-language clients, runtime status information, and if I'm feeling frisky, a simple console for querying the database.

On a more personal note, I very much enjoyed building the processing and request routing pieces.  I'd wanted to build a request router for a Linda-like Tuple Space that I had implemented in the fall of 2005 (blog discussion:1 2 3), but that project never made it's way into the real world.  One of the reasons why it never made it's way off my hard drive was partly because of the multiprocessing package, which I saw as a better way of handling tasks for which Linda-like systems were designed.