Wednesday, August 18, 2010

Introducing YogaTable, the flexible NoSQL database

In my last post, I talked about building a new NoSQL database from scratch, and how I would describe the decisions I made during it's construction, as well as post code along the way.  This is the first of the series introducing YogaTable.  Strangely enough, the toughest part of all of it was coming up with a name.  At the current revision, insertion, updating, and deletion are all supported, and there are tests.  Querying is not yet in the repository, though I have written the query builder and tests.  Those subjects will be in the next post.


The first design decision I have made with this system is that I will not be building a durable data store; that is, one that offers write-or-fail semantics given modern hardware, handling of system crashes, etc.  It's a hard problem.  Systems which make use of more lazy approaches to durability (like mmap in the case of MongoDB) *will* fail at some point due to the laziness, even ignoring bad disks.  Fixing this issue with replication is great, but it requires more hardware, and in the case of sane underlying systems (real hardware, and/or good virtualization solutions like vmware, xen, etc.), doing it right the first time allows us to not have to band-aid over it with replication.  I will instead use an existing system that does it's absolute best to be durable and transactional by design.

I have also decided that I will not be building a B-Tree indexing system from scratch.  Like the durable data store issue just mentioned, B-Trees are terribly difficult to get right.  So to reduce the amount of time it will take to go from no code to fully-working system, I'm just not going to write a new one.  I will instead use a system that already includes support for B-Tree indexes.

Those of you who know me will already guess that YogaTable will be written in Python, primarily because I know Python better than I know any other language, and secondarily because Python includes all of the tools necessary to build YogaTable out of the box on a modern machine.  In fact, Python includes two transactional data stores with B-Tree indexes as part of the standard library: bsddb.btree and sqlite3.

Because I am not without a sense of irony, and because I have had bsddb.btree bite me with data corruption in the past (when not using the transactional semantics that are not documented in the standard library), I'm going to use Python's interface to SQLite 3, which has been included with the core Python distribution since 2.5 .  As such, YogaTable will be "NoSQL" for the external interface to the database, rather than the actual underlying data store.  Also, because of this choice, it will be fairly straightforward for someone to replace SQLite with another SQL database to offer functionality that I might not have gotten around to adding quite yet (like read-only slaves via MySQL, etc).  Once YogaTable is feature complete (according to my earlier requirements and desires), it is my intent to use this in a small-medium scale production environment for my own projects, fixing bugs as they crop up (or as others report them).

I'm sure someone is going to think and/or point out how backwards it is to use a SQL database to store data for a NoSQL database.  And that's reasonable.  But this world is filled with unsolved difficult programming problems.  I could spend literally months rewriting either the durable data store or the B-Tree index.  On the other hand, I have the utmost respect for those who have already built said systems, and have happily used them in dozens of projects.  Two secrets to software and systems engineering: pick your battles, and stand on the shoulders of giants.  I'm going to do both, at the price of a little SQL.


Now that we have a place to store data and indexes, we've got to decide how it's going to be laid out.  For the sake of simplicity with regards to backup, etc., and because I know a little something about how SQLite 3 works, I'm going to lean towards simplicity.  Each table, information about it's indexes, and the indexes themselves will be stored in a single sqlite3 database file.  This file will have three tables; the data table, the index metadata table, and a single table that includes the index data.  Each table will have various B-Tree indexes over them to ensure that our access patterns are fast.  The general 3-table layout and the db abstraction guts are available in lib/om.py.  Also, as a "new to me" bug, I learned that Python's sqlite3 library doesn't handle list/tuple arguments for 'IN' queries, requiring some str(tuple(data)) shenanigans.

For the index table, we will be prefixing the index data with an index id, which will ensure that we are searching the proper subset of index rows when we search.  Now, we could have placed each index in a separate file, then use SQLite's 'ATTACH DATABASE' command, but then we would have had to query all index tables/databases whenever we perform an update/delete, and that's a pain in the ass (never mind a performance killer whenever we have more than one or two indexes).  We do miss being able to determine the size of each index individually, but that wasn't one of our requirements (though we could keep track of this manually).  For details about how we generate index rows, check out lib/pack.py.

To offer greater flexibility, etc., we will not be storing any Python-centric data in our database.  Data will be stored as JSON, and will be automatically converted to/from JSON by sqlite3 adapters/converters.  Also, for the sake of everyone's sanity, we've included support for dates, datetimes, times, and decimals.  Documentation is forthcoming for future non-Python users to easily convert to/from these formats.  For the impatient, check out lib/adapt.py.


Stay tuned for the next post where I will be discussing the sql query generator, and automatic background index construction.

ETA: updated links to reflect some moved files.

3 comments:

  1. "I'm sure someone is going to think and/or point out how backwards it is to use a SQL database to store data for a NoSQL database."

    This is a completely reasonable approach, even if only for the fact that SQLite has a great btree implementation. That you get a SQL interface for DDL and DML is an added bonus.

    ReplyDelete
  2. Hey Tim! Long time no chat.

    There is definitely a bonus to being able to use SQL as part of querying. While I'm not a huge fan of SQL as a general programming language, I did find it very useful to handle an obvious join across the index and data table for queries.

    ReplyDelete