Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate limit download requests globally for HCA #6740

Open
hannes-ucsc opened this issue Dec 5, 2024 · 5 comments
Open

Rate limit download requests globally for HCA #6740

hannes-ucsc opened this issue Dec 5, 2024 · 5 comments
Assignees
Labels
+ [priority] High demo [process] To be demonstrated at the end of the sprint enh [type] New feature or request infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point

Comments

@hannes-ucsc
Copy link
Member

hannes-ucsc commented Dec 5, 2024

In prod, create a global WAF rate-limiting rule for endpoints matching ^(/fetch)?/repository/files.

Let f be the average file size in HCA in GiB. Let b be the daily budget for downloads from HCA in $. Let d be the cost for downloading one GiB in $. To ensure that the total download cost doesn't exceed the daily budget b, we need to limit the rate r of download requests to r < b/d/f downloads per day, or r < b/d/f/24/6 downloads per 10 min, the longest evaluation period that WAF supports.

The download cost d differs by destination, say country A, country B and W (the rest of the world). See this Google sheet for details. The Google sheet also contains the value for b. We could use three different geo-matched rules with a separate budget per destination but we were asked not to. Instead we'll just take the weighted average of the per-destination costs and use a single global rule. In a spike, we'll determine the weights based on the distribution of download requests between A, B and W.

Additionally, the cost d is tiered by monthly volume. Given the concrete value for b we'd be in the top tier after REDACTED number of days so let's just use the top tier (the lowest cost).

The average file size for HCA is 0.6 GiB. We'll use that value for f for now, but I assume that there is a bias towards larger files. Once #6739 is solved, we can determine that bias and adjust the rate limit accordingly (#6741).

@hannes-ucsc hannes-ucsc added the orange [process] Done by the Azul team label Dec 5, 2024
@hannes-ucsc hannes-ucsc changed the title Rate limit download requests globally Rate limit download requests globally for HCA Dec 5, 2024
@hannes-ucsc hannes-ucsc added enh [type] New feature or request infra [subject] Project infrastructure like CI/CD, build and deployment scripts labels Dec 5, 2024
@hannes-ucsc
Copy link
Member Author

Spike to determine the weights based on the distribution of download requests between A, B and W. There is a request header added by API Gateway (or CloudFront) that contains the ISO country code of the requester. Azul logs that header.

@hannes-ucsc hannes-ucsc added the spike:1 [process] Spike estimate of one point label Dec 5, 2024
@dsotirho-ucsc dsotirho-ucsc self-assigned this Dec 5, 2024
@dsotirho-ucsc dsotirho-ucsc added the + [priority] High label Dec 5, 2024
@dsotirho-ucsc
Copy link
Contributor

Spike results available on 2nd tab of Google Sheet

@nadove-ucsc
Copy link
Contributor

I tested how WAF processes regex matches by creating a rule that blocked all requests where the URI path matched the regex "repository" (no slashes, anchors, etc), regardless of IP or request rate. Requests to /fetch/repository/files/bar were subsequently blocked, demonstrating that the regex doesn't need to occur at the beginning of the string or match the entire string.

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Dec 9, 2024

Path normalization does not appear to be needed, as requests such as /repository/../repository/sources and /repository//sources are already rejected.

@hannes-ucsc
Copy link
Member Author

For demo, show that the number of download requests in prod never exceeded 59 downloads per 10min in the two week period after this change lands there.

@hannes-ucsc hannes-ucsc added the demo [process] To be demonstrated at the end of the sprint label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
+ [priority] High demo [process] To be demonstrated at the end of the sprint enh [type] New feature or request infra [subject] Project infrastructure like CI/CD, build and deployment scripts orange [process] Done by the Azul team spike:1 [process] Spike estimate of one point
Projects
None yet
Development

No branches or pull requests

3 participants