Friday, October 29, 2010

Being a student isn't easy, it requires actual work

Wandering about the internet this morning prior to doing some actual work, I happened upon a blog post by Seth Goodin about teaching, and how students should demand better instruction. Historically, I've generally agreed with and enjoyed Seth's blog (though lamenting being unable to comment there directly), but in this case, I think he's missing the point.

I can certainly appreciate the plight of college students everywhere, spending tens of thousands of dollars to go to school. I did the same thing for my college years, and even continued for another 5 1/2 years to go to grad school. Almost three years out of it, and I've still got loans, and probably will for a few years to come (9 1/2 years of postsecondary education isn't cheap in the states). However, I had the pleasure of being taught by amazing professors at both institutions that I attended, but even more importantly, attended school with interested and engaged students (I wasn't the only one doing homework on Friday nights). Sadly, this isn't always the case...

Everyone agrees that if you have a poor teacher/professor, your learning (and grades) will suffer... but there's a limit to which that is the instructor's fault. So often when I was both studying and teaching, I would hear complaints (and offered a few myself) about a poor instructor. Either they didn't care, didn't understand where their teaching fell on deaf ears, taught something unrelated to the course, ... However, when confronted with this type of instructor, a student is given an opportunity to engage themselves in learning. Classes come with books, and instructors are meant to help the student understand and integrate the knowledge and wisdom within those books. But prior to the internet, Wikipedia, or Khan Academy, students have managed to learn, despite poor instructors. How? They read and studied the books, consulting their fellow and elder students when they had questions. I know I was different in this regard, as when I found difficulty understanding my teacher during Trigonometry in high school, I read the book, studied, and understood it. When asked by other students how I managed to do well despite a confusing teacher, I pointed at the book. Only a few of them had taken the time to read the book beyond the problems, or when they did, would take the time to understand it.

Back when I was a TA in grad school, I made many mistakes (mostly in my first couple quarters). But by the final quarter of my teaching stint, I was doing 3 back-to-back sections for the same course, an hour each. The students who showed up and let even just a little bit of my enthusiasm rub off on them were engaged (if you are not excited about what you are teaching, students won't care, and students won't come). But what happened was that 5-10% of students never showed up for any of the discussion sections (except for reviews prior to exams). They would sometimes go to class (sometimes watching television or DVDs in the back rows), read their classmate's notes, hand in half-copied, half-bullshit homework, and expect to learn enough in one hour to be sufficient for an Algorithms/Data Structures midterm/final. Sometimes they managed to cram enough, but usually we would be overly generous and give them a D-.


I can appreciate what Seth is trying to say: expect more from your teachers/professors/instructors. But an amazing instructor can only go so far. Students must also be engaged and willing to participate in the process of learning, otherwise they are at least as much to blame for wasting their time and money as a poor instructor.

If you want to see more posts like this, you can buy my book, Redis in Action from Manning Publications today!

Sunday, October 10, 2010

YogaTable as a Database Server

As promised in my last update, YogaTable is no longer an embedded database. Included in the source is a new server component, which listens for requests on a configurable host and port, defaulting to localhost:8765 .

I have included a client for Python, which has everything necessary for basic and advanced YogaTable use. The protocol is basically JSON over HTTP GET/POST, which makes it straightforward for interacting with using just about any language. I am in the process of documenting what is necessary to write new clients, and will be writing a client for Javascript, as well as a more advanced Python client library. Some simple benchmarks with Apache Bench tell me that YogaTable can perform 60 single inserts/second, and around 2500 bulk inserts/second, but that's in mostly ideal conditions.

One of the features that I am most excited about is being able to script the modification of multiple rows in the database with Lisp. I've taken a merged version of Peter Norvig's lis.py and lispy.py, improved the performance, removed some unnecessary features (some of which were unnecessary for database updates), added some other features, and ... Well, let's just see what it looks like. The following is an example from the YogaTable's tests. It shows how you can transactionally update two rows in the database at the same time, and more specifically, how one could implement transferring money from one account to another.

First, let's set up our rows in the database.
d1 = {'value':decimal.Decimal('200.00')}
d2 = {'value':decimal.Decimal('0.00')}
ids = zip(*self.table.insert([d1, d2]))[0]
d1['_id'] = ids[0]
d2['_id'] = ids[1]

Now, let's set up our shared data, and prepare for the output of our test.
shared = {'transfer':decimal.Decimal('45.23')}
d1['value'] -= shared['transfer']
d2['value'] += shared['transfer']

Let's actually perform the conditional update...
out = self.table.update([
    {'_id':ids[0],
     '__ops':'''
        (load types)
        (define zero (decimal `0.00))
        (define balance (getv `doc `value zero))
        (define transfer (getv `shared `transfer zero))
        (if (>= balance transfer)
            (begin
                (setv `doc `value (- balance transfer))
                (setv `shared `transferred #t)))
        '''},
    {'_id':ids[1],
     '__ops':'''
        (load types)
        (define zero (decimal `0.00))
        (define balance (getv `doc `value zero))
        (define transfer (getv `shared `transfer zero))
        (if (getv `shared `transferred #f)
            (setv `doc `value (+ balance transfer)))
        (delv `shared `transferred)
        (delv `shared `transfer)
        '''}], shared=shared)

The Lisp in here may look a little strange, as some of it is nonstandard. The first few lines of the operations for the rows loads the 'types' module, which offers access to the Python decimal.Decimal datatype (among others), pulls some balance information, and determines how much money is supposed to be transfered. The last few lines in the first operation verifies that there is enough money in the account, then deducts the money, and sets the shared variable 'transferred' to True.

The second operation checks to see if 'transferred' is True, and if so, adds the transferred balance to the second row. The two 'delv' lines in the second operation are merely there to remove the known shared variables so that if someone were to accidentally include a third row, then it wouldn't have access to this data.

And that's it. Money transfers in YogaTable. No need for 2-stage commits.


At this point, you are probably wondering where YogaTable is going as a piece of software. When Google first released AppEngine, one of the things that I was most intrigued by was it's Datastore. Some features I'd never seen before (indexes on all of the values in a list, in particular), and I wished that it was available outside of AppEngine. I'd been meaning to write an AppEngine Datastore-like backend for a long time, and some early versions of YogaTable were actually meant to allow for people to take the Google AppEngine SDK and plug my backend into it. It was meant as a way of scaling the SDK beyond trivial applications, and really, to allow for the full set of features and functionality offered by Appengine's Datastore to people who didn't want to run in Google's datacenters. That is not where YogaTable is going.

After having used MongoDB in production, I realized that the current software offerings for databases was missing something. Something that wasn't tied down to schemas like classic relational databases. Something that wasn't limited if you happened to *only* have a 32 bit machine. Something that could offer enough power for building a moderately-used web site (one million hits/day), but was flexible enough to not get in your way while you were developing it.

And thus, YogaTable was born. Aside from the design requirement of never performing table scans, and it's current lack of built-in replication/clustering, YogaTable today offers sufficient features to get almost any idea from concept to a million hits/day. And with the introduction of a Lisp interpreter, YogaTable is able to offer functionality that is otherwise very difficult in other systems (the simple multi-row update shown above requires a tricky 2-stage commit using AppEngine's Datastore).


There is still work to be done on YogaTable. Mostly, I need to document everything. From there, next steps include replication, clients in a few different languages, support for read-only replicas, automatic master/slave failover, clustering... But all in good time. Documentation first, features next.

I hope everyone stays interested, I know that I'm having fun.