Wednesday, November 26, 2014

Introduction to rate limiting with Redis [Part 2]

This article first appeared on November 3, 2014 over on Binpress at this link. I am reposting it here so my readers can find it easily.

In Introduction to rate limiting with Redis [Part 1], I described some motivations for rate limiting, as well as provided some Python and Lua code for offering basic and intermediate rate limiting functionality. If you haven’t already read it, you should, because I’m going to discuss several points from the article. In this post, I will talk about and address some problems with the previous methods, while also introducing sliding window functionality and-cost requests.

Problems with previous methods

The last rate limiting function that we wrote was over_limit_multi_lua(), which used server-side Lua scripting in Redis to do the heavy lifting of actually performing the rate limiting calculations. It is included below with the Python wrapper as a reference.

def over_limit_multi_lua(conn, limits=[(1, 10), (60, 120), (3600, 240)]):
    if not hasattr(conn, 'over_limit_lua'):
        conn.over_limit_lua = conn.register_script(over_limit_multi_lua_)

    return conn.over_limit_lua(
        keys=get_identifiers(), args=[json.dumps(limits), time.time()])

over_limit_multi_lua_ = '''
local limits = cjson.decode(ARGV[1])
local now = tonumber(ARGV[2])
for i, limit in ipairs(limits) do
    local duration = limit[1]

    local bucket = ':' .. duration .. ':' .. math.floor(now / duration)
    for j, id in ipairs(KEYS) do
        local key = id .. bucket

        local count = redis.call('INCR', key)
        redis.call('EXPIRE', key, duration)
        if tonumber(count) > limit[2] then
            return 1
        end
    end
end
return 0
'''

Hidden inside this code are several problems that can limit its usefulness and correctness when used for its intended purpose. These problems and their solutions are listed below.

Generating keys in the script

One of the first problems you might notice was mentioned in a comment by a commenter named Tobias on the previous post, which is that we are constructing keys inside the Lua script. If you’ve read the Redis documentation about Lua scripting, you should know that we are supposed to be passing all keys to be used in the script from outside when calling it.

The requirement to pass keys into the script is how Redis attempts to future-proof Lua scripts that are being written, as Redis Cluster (currently in beta) distributes keys across multiple servers. By having your keys known in advance, you can calculate which Redis Cluster server the script should run on, and if keys are on multiple Cluster servers, that the script can’t run properly.

Our first problem is that generating keys inside the script can make the script violate Redis Cluster assumptions, which makes it incompatible with Redis Cluster, and generally makes it incompatible with most key-based sharding techniques for Redis.

To address this issue for Redis Cluster and other client-sharded Redis setups, we must use a method that handles rate limiting with a single key. Unfortunately, this can prevent atomic execution for multiple identifiers for Redis Cluster, but you can either rely on a single identifier (user id OR IP address, instead of both), or stick with non-clustered and non-sharded Redis in those cases.

What we count matters

Looking at our function definition, we can see that our default limits were 10 requests per second, 120 requests per minute, and 240 requests per hour. If you remember from the “Counting correctly” section, in order for our rate limiter to complete successfully, we needed to only increment one counter at a time, and we needed to stop counting if that counter went over the limit.

But if we were to reverse the order that the limits were defined, resulting in us checking our per-hour, then per-minute, then per-second limits (instead of per-second, minute, then hour), we would have our original counting problem all over again. Unfortunately, due to details too involved to explain here, just sorting by bucket size (smallest to largest) doesn’t actually solve the problem, and even the original order could result in requests failing that should have succeeded. Ultimately our problem is that we are counting all requests, both successful and unsuccessful (those that were prevented due to being over the limit).

To address the issue with what we count, we must perform two passes while rate limiting. Our first pass checks to see if the request would succeed (cleaning out old data as necessary), and the second pass increments the counters. In previous rate limiters, we were basically counting requests (successful and unsuccessful). With this new version, we are going to only count successful requests.
Stampeding elephants

One of the most consistent behaviors that can be seen among APIs or services that have been built with rate limiting in mind is that usually request counts get reset at the beginning of the rate limiter’s largest (and sometimes only) time slice. In our example, at every hour on the hour, every counter that had been incremented is reset.

One common result for APIs with these types of limits and limit resets is what’s sometimes referred to as the “stampeding elephants” problem. Because every user has their counts reset at the same time, when an API offers access to in-demand data, many requests will occur almost immediately after limits are reset. Similarly, if the user knows that they have outstanding requests that they can make near the end of a time slice, they will make those requests in order to “use up” their request credit that they would otherwise lose.

We partially addressed this issue by introducing multiple bucket sizes for our counters, specifically our per-second and per-minute buckets. But to fully address the issue, we need to implement a sliding-window rate limiter, where the count for requests that come in at 6:01PM and 6:59PM aren’t reset until roughly an hour later at 7:01PM and 7:59PM, respectively, not at 7:00PM. Further details about sliding windows are a little later.

Bonus feature: variable-cost requests

Because we are checking our limits before incrementing our counts, we can actually allow for variable-cost requests. The change to our algorithm will be minor, adding an increment for a variable weight instead of 1.

Sliding Windows

The biggest change to our rate limiting is actually the process of changing our rate limiting from individual buckets into sliding windows. One way of understanding sliding window rate limiting is that each user is given a number of tokens that can be used over a period of time. When you run out of tokens, you don't get to make any more requests. And when a token is used, that token is restored (and can be used again) after the the time period has elapsed.

As an example, if you have 240 tokens that can be used in an hour, and you used 20 tokens at 6:05PM, you would only be able to make up to another 220 requests until 7:04PM. At 7:05PM, you would get those 20 tokens back (and if you made any other requests between 6:06PM and 7:05PM, those tokens would be restored later).

With our earlier rate limiting, we basically incremented counters, set an expiration time, and compared our counters to our limits. With sliding window rate limiting, incrementing a counter isn’t enough; we must also keep history about requests that came in so that we can properly restore request tokens.

One way of keeping a history, which is the method that we will use, is to imagine the whole window as being one large bucket with a single count (the window has a ‘duration’), similar to what we had before, with a bunch of smaller buckets inside it, each of which has their own individual counts. As an example, if we have a 1-hour window, we could use smaller buckets of 1 minute, 5 minutes, or even 15 minutes, depending on how precise we wanted to be, and how much memory and time we wanted to dedicate (more smaller buckets = more memory + more cleanup work). We will call the sizes of the smaller buckets their “precision.” You should notice that when duration is the same as precision, we have regular rate limits. You can see a picture of various precision buckets in a 1 hour window below.


As before, we can consider the smaller buckets to be labeled with individual times, say 6:00PM, 6:01PM, 6:02PM, etc. But as the current time becomes 7:00PM, what we want to do is to reset the count on the 6:00PM bucket to 0, adjust the whole window’s count, and re-label the bucket to 7:00PM. We would do the same thing to the 6:01PM bucket at 7:01PM, etc.

Data representation

We’ve now gotten to the point where we need to start talking about data representation. We didn’t really worry about representation before simply because we were storing a handful of counters per identifier. But now, we are no longer just storing 1 count for a 1 hour time slice, we could store 60 counts for a 1 hour time slice (or more if you wanted more precision), plus a timestamp that represents our oldest mini-bucket label.

For a simpler version of sliding windows, I had previously used a Redis LIST to represent the whole window, with each item in the LIST including both a time label, as well as the count for the smaller buckets. This can work for limited sliding windows, but restricts our flexibility when we want to use multiple rate limits (Redis LISTs have slow random access speeds).

Instead, we will use a Redis HASH as a miniature keyspace, which will store all count information related to rate limits for an identifier in a single HASH. Generally, for a sliding window of a specified duration and precision for an identifier, we will have the HASH stored at the key named by the identifier, with contents of the form:

<duration>:<precision>:o --> <timestamp of oldest entry>
<duration>:<precision>: --> <count of successful requests in this window>
<duration>:<precision>:<ts> --> <count of successful requests in this bucket>

For sliding windows where more than one sub-bucket has had successful requests, there can be multiple <duration>:<precision>:<ts> entries that would each represent one of the smaller buckets. For regular rate limits (not sliding window), the in-Redis schema is the same, though there will be at most one <duration>:<precision>:<ts> key, and duration is equal to precision for regular rate limits (as we mentioned before).

Because of the way we named the keys in our HASH, a single HASH can contain an arbitrary number of rate limits, both regular and windowed, without colliding with one another.

Putting it all together

And finally, we are at the fun part; actually putting all of these ideas together. First off, we are going to use a specification for our rate limits to simultaneously support regular and sliding window rate limits, which looks a lot like our old specification.

One limit is: [duration, limit, precision], with precision being optional. If you omit the precision option, you get regular rate limits (same reset semantics as before). If you include the precision option, then you get sliding window rate limits. To pass one or more rate limits to the Lua script, we just wrap the series of individual limits in a list: [[duration 1, limit 1], [duration 2, limit 2, precision 2], ...], then encode it as JSON and pass it to the script.

Inside the script we need to make two passes over our limits and data. Our first pass cleans up old data while checking whether this request would put the user over their limit, the second pass increments all of the bucket counters to represent that the request was allowed.

To explain the implementation details, I will be including blocks of Lua that can be logically considered together, describing generally what each section does after. Our first block of Lua script will include argument decoding, and cleaning up regular rate limits:

local limits = cjson.decode(ARGV[1])
local now = tonumber(ARGV[2])
local weight = tonumber(ARGV[3] or '1')
local longest_duration = limits[1][1] or 0
local saved_keys = {}
-- handle cleanup and limit checks
for i, limit in ipairs(limits) do

    local duration = limit[1]
    longest_duration = math.max(longest_duration, duration)
    local precision = limit[3] or duration
    precision = math.min(precision, duration)
    local blocks = math.ceil(duration / precision)
    local saved = {}
    table.insert(saved_keys, saved)
    saved.block_id = math.floor(now / precision)
    saved.trim_before = saved.block_id - blocks + 1
    saved.count_key = duration .. ':' .. precision .. ':'
    saved.ts_key = saved.count_key .. 'o'
    for j, key in ipairs(KEYS) do

        local old_ts = redis.call('HGET', key, saved.ts_key)
        old_ts = old_ts and tonumber(old_ts) or saved.trim_before
        if old_ts > now then
            -- don't write in the past
            return 1
        end

        -- discover what needs to be cleaned up
        local decr = 0
        local dele = {}
        local trim = math.min(saved.trim_before, old_ts + blocks)
        for old_block = old_ts, trim - 1 do
            local bkey = saved.count_key .. old_block
            local bcount = redis.call('HGET', key, bkey)
            if bcount then
                decr = decr + tonumber(bcount)
                table.insert(dele, bkey)
            end
        end

        -- handle cleanup
        local cur
        if #dele > 0 then
            redis.call('HDEL', key, unpack(dele))
            cur = redis.call('HINCRBY', key, saved.count_key, -decr)
        else
            cur = redis.call('HGET', key, saved.count_key)
        end

        -- check our limits
        if tonumber(cur or '0') + weight > limit[2] then
            return 1
        end
    end
end

Going section by section though the code visually, where a blank line distinguishes individual sections, we can see 6 sections in the above code:
  1. Argument decoding, and starting the for loop that iterates over all rate limits
  2. Prepare our local variables, prepare and save our hash keys, then start iterating over the provided user identifiers (yes, we still support multiple identifiers for non-clustered cases, but you should only pass one identifier for Redis Cluster)
  3. Make sure that we aren’t writing data in the past
  4. Find those sub-buckets that need to be cleaned up
  5. Handle sub-bucket cleanup and window count updating
  6. Finally check the limit, returning 1 if the limit would have been exceeded
Our second and last block of Lua operates under the precondition that the request should succeed correctly, so we only need to increment a few counters and set a few timestamps:

-- there is enough resources, update the counts
for i, limit in ipairs(limits) do
    local saved = saved_keys[i]

    for j, key in ipairs(KEYS) do
        -- update the current timestamp, count, and bucket count
        redis.call('HSET', key, saved.ts_key, saved.trim_before)
        redis.call('HINCRBY', key, saved.count_key, weight)
        redis.call('HINCRBY', key, saved.count_key .. saved.block_id, weight)
    end
end

-- We calculated the longest-duration limit so we can EXPIRE
-- the whole HASH for quick and easy idle-time cleanup :)
if longest_duration > 0 then
    for _, key in ipairs(KEYS) do
        redis.call('EXPIRE', key, longest_duration)
    end
end

return 0

Going section by section one last time gets us:
  1. Start iterating over the limits and grab our saved hash keys
  2. Set the oldest data timestamp, and update both the window and buckets counts for all identifiers passed
  3. To ensure that our data is automatically cleaned up if requests stop coming in, set an EXPIRE time on the keys where our hash(es) are stored
  4. Return 0, signifying that the user is not over the limit

Optional fix: use Redis time

As part of our process for checking limits, we fetch the current unix timestamp in seconds. We use this timestamp as part of the sliding window start and end times and which sub-bucket to update. If clients are running on servers with reasonably correct clocks (within 1 second of each other at least, within 1 second of the true time optimally), then there isn’t much to worry about. But if your clients are running on servers with drastically different system clocks, or on systems where you can’t necessarily fix the system clock, we need to use a more consistent clock.

While we can’t always be certain that the system clock on our Redis server is necessarily correct (just like we can’t for our other clients), if every client uses the time returned by the TIME command from the same Redis server, then we can be reasonably assured that clients will have fairly consistent behavior, limited to the latency of a Redis round trip with command execution.

As part of our function definition, we will offer the option to use the result of the TIME command instead of system time. This will result in one additional round trip between the client and Redis to fetch the time before passing it to the Lua script.

Add in our Python wrapper, which handles the optional Redis time and request weight parameters, and we are done:

def over_limit_sliding_window(conn, weight=1, limits=[(1, 10), (60, 120), (3600, 240, 60)], redis_time=False):
    if not hasattr(conn, 'over_limit_sliding_window_lua'):
        conn.over_limit_sliding_window_lua = conn.register_script(over_limit_sliding_window_lua_)

    now = conn.time()[0] if redis_time else time.time()
    return conn.over_limit_sliding_window_lua(
        keys=get_identifiers(), args=[json.dumps(limits), now, weight])

If you would like to see all of the rate limit functions and code in one place, including the over_limit_sliding_window() Lua script with wrapper, you can visit this Github gist.

Wrap up and conclusion

Congratulations on getting this far! I know, it was a slog through problems and solutions, followed by a lot of code, and now after seeing all of it I get to tell you what you should learn after reading through all of this.

Obviously, the first thing you should get out of this article is an implementation of sliding window rate limiting in Python, which is trivially ported to other languages -- all you need to do is handle the wrapper. Just be careful when sending timestamps, durations, and precision values to the script, as the EXPIRE call at the end expects all timestamp values to be in seconds, but some languages natively return timestamps as milliseconds instead of seconds.

You should also have learned that performing rate limiting with Redis can range from trivial (see our first example in part 1) to surprisingly complex, depending on the features required, and how technically correct you want your rate limiting to be. It also turns out that the problems that were outlined at the beginning of this article aren’t necessarily deal-breakers for many users, and I have seen many implementations similar to the over_limit_multi_lua() method from part 1 that are perfectly fine for even heavy users*. Really it just means that you have a choice about how you want to rate limit.

And finally, you may also have learned that you can use Redis hashes as miniature keyspaces to collect data together. This can be used for rate limiting as we just did, as well as a DB row work-alike (the hash keys are like named columns, with values the row content), unique (but unsorted) indexes (i.e. email to user id lookup table, id to encoded data lookup table, ...), sharded data holders, and more.

For more from me on Redis and Python, you can check out the rest of my blog at dr-josiah.com.

* When Twitter first released their API, they had a per-hour rate limit that was reset at the beginning of every hour, just like our most basic rate limiter from part 1. The current Twitter API has a per-15 minute rate limit, reset at the beginning of every 15 minute interval (on the hour, then 15, 30, and 45 minutes after the hour) for many of their APIs. (I have no information on whether Twitter may or may not be using Redis for rate limiting, but they have admitted to using Redis in some capacity by virtue of their release of Twemproxy/Nutcracker).

2 comments:

  1. How would you return with a number of seconds or a timestamp instead of 1? I'm trying to set a 'Retry-After' header and getting that value instead of a dummy number 1 would be more useful.

    I tried every trick manipulating that code to return something but it fails. I admit this is my first time touching lua code so I must be doing something wrong.

    ReplyDelete
    Replies
    1. Sorry, just saw this. Replied on the github gist.

      Delete