Wednesday, May 29, 2019

Re: Leaky Python Batteries

After reading information about Amber Brown's talk regarding Python and its included "batteries" in the form of the standard library, there is a lot to agree with, and only a small amount to not agree with. I know I'm late.

Here's my take...


For experience consideration; I'm the reason that asyncore/asynchat/... (callback-defined async sockets in Python before asyncio) was included and updated in Python 2.6, and lived to see Python 3.0 (other folks took on maintenance after 2.6.?/3.0.?). I stepped on a lot of toes to make that happen, but I was still using asyncore/asynchat, and not having them in the standard library would have been bad for me and folks like me. So I "stepped up".

My experience is that moving anything in Python in any direction is hard and frustrating, no matter how "sane" you think your words and opinions are*. I remember being confused that I was even having to argue for static byte literals in Python 3.0, never mind pushing for a syntax like b'hello' instead of bytes([104, 101, 108, 108, 111]) [1], or even that an immutable variant should be available (instead of using a tuple for hash keys).

Thinking about it well after the fact, I have the sense it was because I didn't feel like I had any real level of ownership or control. I felt responsible as a member of the community to voice my opinion about the software I was using, to try to contribute. But I also felt as though I didn't have any level of real ownership. Like, somehow there was a path through the Python community that lead to being able to fix a 2 line bug without 25 people signing off on the change. I saw people walking the path, but I couldn't really find it or manage to walk it myself.

In Contrast


But compare that experience to your own project? Or even working in almost any tech-based organization? I was fortunate, in the 21 months I put in at YouTube / Google from 2008-2010, I was made an "oldtuber", and later "owner" of several files early on as I hit readability with Python quickly, and showed that I could actually fix problems with the platform. Within 3 months of joining in 2008, I could write code, ensure it passed the automated tests, commit without *any* code reviews in a half dozen files, and have code running on youtube.com within a week of writing it (pushes for hotfixes could happen randomly, sometimes 3-4 times a week back then).

Compare that with any library in Python? I can't even remember how many folks needed to sign off on my asyncore / asynchat changes, but it was far more than the 0/1/2 I needed to commit at YouTube. Python has more folks weighing in now than ever before, and I bet that's not making contributing easier.

I don't know how much this feeling pervades other existing / past python contributors, nor how much this is a barrier for creating new contributors. I don't know if this is a pain point for others, but it was big for me.

I don't know if Amber's frustration is similarly comparing her experience in Twisted vs. Python core. Where in Twisted she found it easy to make an impact and help the project, but Python core is uninterested / unwilling in similar insights. I don't know (a lot of things).

Regardless, I don't know if splitting the Python standard library out makes sense. On the one hand, pulling out the standard library would give some people more control and nearly absolute ownership. But that will mostly just create inconsistencies in code quality, maintenance, and interoperability - like we've all seen in our own dependency hells elsewhere. All the while making Python a language without batteries.

Address the Leaky Batteries


Personally, even if the batteries are leaking, I'd rather have them included. Compiling Python issues aside (you can disable compiling tkinter / lxml modules FWIW), that's as much a "work on Python's Makefiles" as it is anything else. Which is a much more approachable and appropriate thing than getting the organization to create dozens of sub-projects.


I do agree with Amber that asyncio's inclusion in Python does make it hard for the community, especially given how long that Twisted has been around and doing much of the same stuff. Heck, I even suggested that if Python were to deprecate and remove modules from the standard library because the functionality were duplicated in Twisted, then the answer was to dump those modules and include Twisted [2] (even though I've actually never used Twisted, and have more recently used asyncore/asynchat, asyncio, pyuv, uvloop, and shrapnel since I read Twisted source). But it's going to be hard to put that genie back in the bottle; best case scenario is that there aren't any more "big" packages included that quickly.

Ultimately, I believe that the answer might just be better organization, better management of maintainers, better expectation management re: possible contributors, and better management of contributions. I think that giving people more ownership of the modules / packages they maintain, but also allow for the community to override a bad maintainer, might help things move forward in each individual module. If everything splits, many sorts of organization-level efficiencies are lost, including just the benefit of being part of a larger project meaning you get more attention at all levels.

Personally, I'd suggest sticking together and working on trying to manage the project itself better. For some subjective definition of "better".

A Mistake 20 Years in the Making


That said, until visiting [3] while writing this entry, I didn't even know there was a formal process of being involved in the Python Software Foundation and moving through the organization. I know I spent 10-20 hours a week trying to help back when I was active on the mailing lists. No wonder I failed to find a path to success in Python core for the 20 years I've used Python; I didn't know you literally needed to be a part of the club. Hah.

C'est la vie.
[1] - https://mail.python.org/pipermail/python-3000/2007-February/005822.html
[2] - https://mail.python.org/pipermail/python-dev/2007-February/071202.html
[3] - https://www.python.org/psf/membership/

Wednesday, January 23, 2019

My last 6 months - making StructD worth the time

For those of you who have been paying attention to this blog might be asking, where is Josiah? Well, I've been busy. Not only did I get to go on a little vacation (much needed), but I've also been making StructD (my fork of Redis) better. How? Watch inline below, or if that doesn't work: https://www.youtube.com/watch?v=dr9XepJqm-4




The short version for those who didn't / can't watch is:
  • Background memory snapshot memory use worst-case is 99.8%+ reduced
    • Old: 100 Gig Redis could use 200 Gigs of memory during snapshot, or +100 Gigs
    • New: 100 Gig Redis could use 100.2 Gigs of memory during snapshot worst-case, really +198 Megs, a 99.8% reduction
  • Snapshots load 20-75% faster
  • Snapshots created 25-65% faster
  • Cold replicas start 50-70% faster 
Other things not covered in the video:
  • More Lua environment hardening
  • Better Lua tracebacks
  • Temporarily removed Lua debugger (transactions means that we can do this another way)
  • Improved Lua transaction performance
  • Lua Aliases:
    • SCRIPT LOAD <script> [<alias name> [NX|REPLACE]]
    • SCRIPT ALIAS <sha1> [<alias name> [NX|REPLACE]]
    • SCRIPT UNALIAS <alias name> ...
    • EVALSHA <alias name> <keycount> <keys ...> <argv ...>
    • EVAL "return CALL.alias_name({key1, ...}, {argv1, ...})" ...
  • Lua and other operations are better behaved with respect to one another (more things can happen during Lua scripts)
  • crc16 - 6x faster than Redis, 33% faster than Mark Adler / Matt Standcliff
  • crc64 - 6-8x faster than Redis, 33-66% faster than Mark Adler / Matt Standcliff
  • TXN.MULTI to go with TXN.EVAL / TXN.ESHA, for rolling back data changes on command execution error
Combined with our earlier SSL/TLS additions, Lua transactions, etc., we've got faster and lower-memory snapshot operations, more convenience in your Lua scripts, and performance comparable to or better than open source Redis with just about any SSL/TLS unwrapping daemon.

I've also been shopping around talks surrounding the engineering choices leading to the performance improvements and memory savings currently enjoyed by StructD. From these choices, there are clear incremental steps towards a further 50-80% reduction in snapshot creation / load time, plus a general 50-80% reduction in memory use overall. I will get there, just not tomorrow.

As announced several months ago, I've been working towards starting a hosting company. Still moving in that direction, but until that is quite finished, I've decided the best way to get there in the interim is to open up shop for both unpaid public, and paid private Redis and StructD support.

So, if you'd like to get Redis or StructD support from me, download StructD, get StructD hosting in the future, or just feel like emailing me on a mailing list again, please feel free to drop by StructD.com to learn more.

     StructD 5.0.0 (8d6715c9/1) 64 bit   Mode: standalone   _______
 _______          Build: 20190104.0019   Port: 6380        |   __  \ _
/  _____)     _    http://structd.com    PID: 2670   _     |  |  |  | \_
| /\____\) __| |\_   _ _____   _     _    ______  __| |\_  |  |  |  |:| \_
| \/___   (__   __) | '_____) | |\  | |\ /  ____)(__   __) |  |  |  | |:| \
\_____ \-_ \_| |\_\)| / \___\)| | | | | || /\___\)\_| |\_\)|  |__|  | | |:|
 \____\ | \  | | |  | | /     | | | | | || |/       | | |  |_______/ \| | |
 _____/ | |  | | |  | | |     | \_|_/ | || \|___    | | |   \________/ \| |
(______/ \|  |_| |  |_| |     \_____,_/ |\______)   |_| |     \________/ \|
 \_____\_/    \_\|   \_\|      \_____,\/  \_____\)   \_\|       \________/