-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate limit download requests globally for HCA #6740
Comments
Spike to determine the weights based on the distribution of download requests between A, B and W. There is a request header added by API Gateway (or CloudFront) that contains the ISO country code of the requester. Azul logs that header. |
Spike results available on 2nd tab of Google Sheet |
I tested how WAF processes regex matches by creating a rule that blocked all requests where the URI path matched the regex "repository" (no slashes, anchors, etc), regardless of IP or request rate. Requests to |
Path normalization does not appear to be needed, as requests such as |
For demo, show that the number of download requests in |
In
prod
, create a global WAF rate-limiting rule for endpoints matching^(/fetch)?/repository/files
.Let f be the average file size in HCA in GiB. Let b be the daily budget for downloads from HCA in $. Let d be the cost for downloading one GiB in $. To ensure that the total download cost doesn't exceed the daily budget b, we need to limit the rate r of download requests to r < b/d/f downloads per day, or r < b/d/f/24/6 downloads per 10 min, the longest evaluation period that WAF supports.
The download cost d differs by destination, say country A, country B and W (the rest of the world). See this Google sheet for details. The Google sheet also contains the value for b. We could use three different geo-matched rules with a separate budget per destination but we were asked not to. Instead we'll just take the weighted average of the per-destination costs and use a single global rule. In a spike, we'll determine the weights based on the distribution of download requests between A, B and W.
Additionally, the cost d is tiered by monthly volume. Given the concrete value for b we'd be in the top tier after REDACTED number of days so let's just use the top tier (the lowest cost).
The average file size for HCA is 0.6 GiB. We'll use that value for f for now, but I assume that there is a bias towards larger files. Once #6739 is solved, we can determine that bias and adjust the rate limit accordingly (#6741).
The text was updated successfully, but these errors were encountered: