Thursday, April 26, 2018

Yes, I'm Forking Redis - SSL/TLS + Transactions

Update2: The followup is here, for later.

If you haven't seen the video announcement on YouTube, and would like to watch/listen to the 5 minute version instead of reading the text version, here you go.


Done? Great! This is the text version of that notice. I am releasing a fork [1] of Redis and tools surrounding Redis to support my needs. This first release is mostly just tools, specifically adding SSL/TLS support to redis-benchmark and hiredis.

In the next couple weeks, I'll also be releasing a version of Redis server (the real fork) that speaks SSL/TLS on standard listening ports, with Cluster gossip, and the MIGRATE call (used both by Redis Cluster and users). I'm releasing the benchmark tools now because they've been baking for a few weeks, and only required a few Makefile cleanups to make me feel good about releasing them.

The only thing really delaying the full Redis Server release is that I haven't gotten around to redis-cli (because I had clients that already spoke Redis + TLS) or redis-sentinel, and both of those are necessary before I can release the full package (yes, I've already added support for SSL/TLS to the unit tests).

What really isn't delaying the release is that due to some algorithm tuning, and finding better implementations of a few core algorithms, I am seeing 3-5x faster startup times on nearly every dataset I've tested. From huge numbers of short keys, to object models and indexes. All are seeing 3-5x faster startup times to load the full dataset into memory. I will be posting benchmarks of this at the full server release.

In addition to SSL/TLS and the above mentioned performance improvements, this fork will also contain support for my resurrected Transactions in Redis Lua scripts, with expanded support for WATCH/MULTI/EXEC/ROLLBACK style transactions (you send your list of keys with WATCH, and if any operation is on any key not in the list, or if you get an error, your changes are all rolled back).

For those of you who are living in Redis Cluster land, and need transactions on keys that don't live in the same shard/server, I previewed multi-shard Lua script transactions back in February. If you are a commercial user and would like to use any of this in your environment, I offer reasonable support and integration contract rates (and those cluster modifications may require separate licensing).

So, the big news is:

  • SSL/TLS native in Redis (benchmarks below)
    • redis-benchmark SSL/TLS support now
    • hiredis SSL/TLS support now
    • redis-cli soon (not started)
    • redis-sentinel soon (not started)
    • redis-server (server, gossip, migrate, done)
      • unit tests (done)
  • Transactions with rollbacks in Redis (also available in Redis Cluster)
    • Lua scripts rollback on error
    • Rollback on explicit rollback
  • 3-5x faster startup times
    • sample data and more benchmarks later


While not the first feature implemented, encryption over the wire (especially between master/slave replicas, cluster gossip, and during MIGRATE calls) is a basic need. And as a basic need, relying on third party tools for SSL/TLS termination or a transparent VPN solution is a great first step from running without encryption, but it can leave speed on the table. And part of the reason why we use Redis is for speed, right?

Redis itself spends much of its time waiting on network system calls, or to be interrupted to read data from a connection. Quoted benchmarks at conferences since 2015 have claimed that 97% of Redis' time is spent waiting on network-related system calls and interrupts. With 3rd party SSL/TLS termination, that can only get worse. How much worse?

An SSL/TLS terminator needs to read the request (wait, read), then do the decrypt operation, then send the data to Redis (write). Redis does its operations (wait, read, write), then the terminator gets to read from Redis (wait, read), encrypt, then send the response to the client (write). Notice how we went from 3 somewhat basic operations to 9? Those of you running SSL/TLS terminators for your Redis have probably already experienced latency or throughput hits without realizing it. Can we benchmark to see how much native SSL/TLS termination inside Redis buys us? Of course we can.

For this first set of benchmarks, we use 2 computers with the following specifications:

Name: o790
Model: Dell Optiplex 790 workstation
CPU: Core i3-2120 @ 3.30 GHz (2 cores, 2 threads per core, 4 total threads)
RAM: 16GB total (4x 4096 MB DDR3-1333 Synchronous DIMMs)
Media: 1 TB SSD
OS: Ubuntu 14.04
Kernel: 4.4.0-119-generic
SSL/TLS via: OpenSSL 1.1.0h
Compiler: GCC 4.8.4

Name: t7600
Model: Dell Precision T7600 workstation
CPU: 2x Xeon E5-2670 @ 2.60/3.30 GHz (2 CPUs, 8 cores per CPU, 2 threads per core, 32 total threads)
RAM: 128GB total (16x 8192 MB DDR3-1333 Registered DIMMs)
Media: 1 TB SSD
OS: Ubuntu 16.04
Kernel: 4.13.0-38-generic
SSL/TLS via: OpenSSL 1.1.0h
Compiler: GCC 5.4.0

The computers are connected to a Trendnet TEG-S82g 1 gigabit 8 port desktop switch, along with several other devices. Ifconfig reports 1500 byte frames No attempt at isolating traffic on the network was done. Occasional blips in performance were observed due to logging in and out of the workstations to check progress, which is why each command is run 10 times, with averages and standard deviations in response rates plotted.

We consider the following testing variations:

Client test variations:
  • redis-benchmark no TLS at any layer  (Redis-4.0.9)
  • redis-benchmark with included TLS  (Redis-4.0.9, patches)
  • redis-benchmark with external Stunnel TLS (Redis-4.0.9, Stunnel-5.44)
  • redis-benchmark with external Hitch TLS (Redis-4.0.9, Hitch-1.4.8)
Redis server test variations:
  • redis-server with no TLS at any layer (Redis-4.0.9)
  • redis-server with included TLS (Redis-4.0.9, patches)
  • redis-server with external Stunnel TLS (Redis-4.0.9, Stunnel-5.44)
  • redis-server with external Hitch TLS (Redis-4.0.9, Hitch-1.4.8)
Benchmark command-line test variations:
  • -t SET,GET -n 5000000 -r 4750000
    • 5 million SET/GET operations over a 4.75m entry keyspace
  • -t SET,GET -n 25000000 -r 4750000 -P 5
    • 25 million SET/GET operations over a 4.75m entry keyspace, 5 operations at a time
We run each benchmark in each chosen variation 10 times with average and standard deviation plotted (top of the range limited to actual throughput seen). Some combinations of endpoints were not tested due to timing as of the publishing of this article [2], and others due to non-obvious runtime errors [3]. Data from each kept run is combined [4] and plotted using Matplotlib. The x-axis on all plots are "seconds since the start of redis-benchmark". Operations per second numbers are the exact number of operations performed in the last 1000 milliseconds, sampled every 250 milliseconds (see redis-benchmark source code for details). Colors are consistent across plots. Red for 'null-null' represents no SSL/TLS termination on either end. Blue for 'native-native' represents redis-benchmark with '--ssl' connecting directly to our patched Redis. Orange for 'native-stunnel' means redis-benchmark with '--ssl' connecting to Stunnel SSL/TLS termination, feeding into an unpatched Redis 4.0.9 installation. Other lines have similar testing/configuration implications in their naming.


On our lower-powered i3 (2 cores, 4 threads), we are heavily CPU bound, so the least-overhead SSL/TLS termination option (native-native in blue, more information later), is fastest among our encrypted options.

Oddly enough, Stunnel into native here doesn't work well at all, which suggests some mismatch between buffer sizes, request sizes, and/or timing of network operations, as none of the other configurations suffer as badly. We do have latency profiles of all benchmarks performed, so we can look at those in the future for direction on better tuning if this is a common platform in the wild.

That said, once we are less constrained by CPU, it turns out that Stunnel on both ends is fastest here on a total operations/second performance metric. But at what cost? Here we restart the daemons between each run, then get the total CPU time used by each of Redis, Hitch, and Stunnel in each of our 4 variations of recipient on the t7600.

For the t7600 native client into a Hitch-terminated Redis, I saw Hitch use 1h 11m 30s of CPU time, with Redis using 45m 11s. Going into Stunnel-terminated Redis, I saw Stunnel use 4h 12m 44s, Redis using 49m 52s. When we use our native SSL/TLS patches for the same set of tests, Redis uses 49m 53s of CPU. Disabling all SSL/TLS, gets us 41m 11s used by Redis. So relatively speaking, we've got ~1h 57m Redis+Hitch vs. ~5h 3m Redis+Stunnel vs. 50m Redis+native vs 41m for no SSL/TLS termination. Or 185%, 639%, and 22% relative CPU overhead compared to no termination.

With our slow machine again as the server, native termination definitely comes out as fastest in individual operations again, for the performance reasons likely described before. We checked the CPU time again on the o790 side of things.

For the t7600 native client into a Hitch-terminated Redis, I saw Hitch use 1h 33m 24s of CPU time, with Redis using 32m 11s. Going into Stunnel-terminated Redis, I saw Stunnel use 2h 23m 27s, Redis using 51m 45s. When we use our native SSL/TLS patches for the same set of tests, Redis uses 52m 32s of CPU. Disabling all SSL/TLS, gets us 33m 40s. So relatively speaking, we've got 2h 5m Redis+Hitch vs. 3h 18m Redis+Stunnel vs. 53m Redis+native vs 34m for no SSL/TLS termination. Or 268%, 482%, or 56% CPU overhead for SSL/TLS termination relative to no termination.

Higher overhead (relatively speaking) for Hitch and native isn't surprising here, and is likely due to the higher actual cost of the AES encrypt/decrypt performed during SSL/TLS operations on the o790 vs. the t7600. We can partly verify that by checking the speed of the operations on each machine (we can explicitly verify later with timing around encrypt/decrypt operations inside Redis):

o790 $ openssl speed -evp aes-256-gcm
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-gcm      64949.38k    71785.37k   261766.40k   286591.66k   293898.92k

t7600 $ openssl speed -evp aes-256-gcm
The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-256-gcm     251445.05k   684496.60k  1000988.42k  1110611.29k  1141981.18k

And again, native termination has the highest throughput.


So, looking at clear wins here, if you've got an under-powered server, going with native SSL/TLS termination inside Redis is likely to be faster for you, and substantially so, depending on your existing SSL/TLS termination method. If you're looking to save CPU time, native SSL/TLS termination is also a win, as it used less CPU time compared to other methods tested, as expected.

I would have liked to test Nginx as a general SSL/TLS unwrapper, but general unwrapping requires a commercial license that I don't have yet. Perhaps someone with a license can run a couple comparisons on their hardware and report back.

How to help

If you'd like to support this work and continuing work, you can do a few things.



[1] This open source fork will be updated and maintained, likely tied to the most recent stable release for the short term, expanding as needs dictate.
[2] We did not try connecting Stunnel to <->Hitch, nor have we tried cross-machine Hitch to Hitch. We may update the graphs, and this note, along with any other applicable notes as this article and benchmarks are updated.
[3] As of this writing, I was seeing a runtime error for --client configured Hitch servers on the t7600, so could not use the t7600 to start a Hitch tunnel.
[4] Benchmark and analysis tools used are available in our forked repository, where you can see the numeric details of our sample averaging, as well as the raw data we produced in our runs.
[5] Repository:

No comments:

Post a Comment