Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable HPA-compatibility #1087

Open
VinozzZ opened this issue Apr 16, 2024 · 3 comments
Open

Enable HPA-compatibility #1087

VinozzZ opened this issue Apr 16, 2024 · 3 comments
Labels
type: enhancement New feature or request

Comments

@VinozzZ
Copy link
Contributor

VinozzZ commented Apr 16, 2024

Use stress level as an indicator of when Refinery should scale up — when it rises too high for more than a little while, we should add new refinery capacity.

Unfortunately, it is designed to sit close to zero when a refinery is normally loaded as well as when it’s underloaded. The best way to think about being underloaded is time-based — if the server doesn’t pop into stress for several minutes, the cluster probably has too many pods.

While we will be adjusting which metrics are used to measure stress level in the new release, the basic logic behind how it works it will remain unchanged.

Kubernetes has HPA — Horizontal Pod Autoscaling — which normally monitors CPU and/or memory for all of the pods in a cluster. However, it can also be taught to monitor other metrics, provided there is a custom metrics API server in the cluster.

We should implement an instance of this to return stress level as a custom metric. That may be all we need to do (plus configuring k8s to use it); the k8s autoscaler is pretty smart and can be tuned for aggressiveness, so we might be able to avoid changing anything about the basic way that stress_level works.

We’ll have to build the metrics server with Refinery and include it in our release bundle, and update our Refinery Helm chart to use it.

Note that the custom metrics server could also serve metrics relating to Redis, and possibly even allow the redis part of the cluster to also be autoscaled. That’s advanced mode for another day, but the option is there.

@VinozzZ VinozzZ added this to the Dynamic scaling milestone Apr 16, 2024
@VinozzZ VinozzZ added the type: enhancement New feature or request label Apr 16, 2024
@VinozzZ
Copy link
Contributor Author

VinozzZ commented Apr 16, 2024

Discovery so far from @kentquirk :

  • k8s only allows a single metrics adapter for custom metrics
  • if we provide our own for refinery, anyone using prometheus will have to choose, but users NOT using prom (or any other adapter) will have an easy time of it
  • if we use the prom adapter, then users who don't use prom will have to install it just to scale refinery
  • there's a third alternative, KEDA, which we could write a plugin for, which some users may already be working with; it can handle prom and support a refinery plugin at the same time. But it's a bit of a lift for anyone not already using it.
  • An additional factor is that right now, building our own is dead in the water because of version compatibility problems. Hopefully this won't be a problem for long, but it is currently a blocker.

@VinozzZ
Copy link
Contributor Author

VinozzZ commented Apr 16, 2024

@TylerHelmuth was able to make Refinery autoscaling up and down using stress_level via the Prometheus Adapter (which requires a Prometheus instance). This means we can, with no extra code/executables, provide anyone who does not already have a custom metric server running on their cluster a solution to auto-scale refinery with stress_level.
The solution is to install a bunch of Prometheus stuff, but its technically a solution.
The next step would be to see how much I could integrate into the helm chart so that when you install Refinery you can optionally install the extra bits/HorizontalPodAutoscaler needed to scale based on stress_level.

@kentquirk kentquirk removed this from the Dynamic scaling milestone Jul 29, 2024
@TylerHelmuth
Copy link
Contributor

Kubernetes allows you to install the metrics-server (which handles resource scaling) and exactly 1 thing that registers as v1beta2.custom.metrics.k8s.io on the API server. This good news, as it means we don't have to worry about screwing up users resource-scaling solutions and any custom metrics adapter we'd write does not need to handle CPU and memory.

Since there can only be one v1beta2.custom.metrics.k8s.io server we are going to have meet users where they are at and provide them a solution if they have nothing registered on v1beta2.custom.metrics.k8s.io already.

If a user is already using prometheus, there is a custom metrics server called the Prometheus Adapter which they can use with Refinery's prometheus metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants