Enable HPA-compatibility #1087

VinozzZ · 2024-04-16T14:20:36Z

Use stress level as an indicator of when Refinery should scale up — when it rises too high for more than a little while, we should add new refinery capacity.

Unfortunately, it is designed to sit close to zero when a refinery is normally loaded as well as when it’s underloaded. The best way to think about being underloaded is time-based — if the server doesn’t pop into stress for several minutes, the cluster probably has too many pods.

While we will be adjusting which metrics are used to measure stress level in the new release, the basic logic behind how it works it will remain unchanged.

Kubernetes has HPA — Horizontal Pod Autoscaling — which normally monitors CPU and/or memory for all of the pods in a cluster. However, it can also be taught to monitor other metrics, provided there is a custom metrics API server in the cluster.

We should implement an instance of this to return stress level as a custom metric. That may be all we need to do (plus configuring k8s to use it); the k8s autoscaler is pretty smart and can be tuned for aggressiveness, so we might be able to avoid changing anything about the basic way that stress_level works.

We’ll have to build the metrics server with Refinery and include it in our release bundle, and update our Refinery Helm chart to use it.

Note that the custom metrics server could also serve metrics relating to Redis, and possibly even allow the redis part of the cluster to also be autoscaled. That’s advanced mode for another day, but the option is there.

VinozzZ · 2024-04-16T14:25:11Z

Discovery so far from @kentquirk :

k8s only allows a single metrics adapter for custom metrics
if we provide our own for refinery, anyone using prometheus will have to choose, but users NOT using prom (or any other adapter) will have an easy time of it
if we use the prom adapter, then users who don't use prom will have to install it just to scale refinery
there's a third alternative, KEDA, which we could write a plugin for, which some users may already be working with; it can handle prom and support a refinery plugin at the same time. But it's a bit of a lift for anyone not already using it.
An additional factor is that right now, building our own is dead in the water because of version compatibility problems. Hopefully this won't be a problem for long, but it is currently a blocker.

VinozzZ · 2024-04-16T14:25:50Z

@TylerHelmuth was able to make Refinery autoscaling up and down using stress_level via the Prometheus Adapter (which requires a Prometheus instance). This means we can, with no extra code/executables, provide anyone who does not already have a custom metric server running on their cluster a solution to auto-scale refinery with stress_level.
The solution is to install a bunch of Prometheus stuff, but its technically a solution.
The next step would be to see how much I could integrate into the helm chart so that when you install Refinery you can optionally install the extra bits/HorizontalPodAutoscaler needed to scale based on stress_level.

TylerHelmuth · 2024-10-10T16:05:00Z

Kubernetes allows you to install the metrics-server (which handles resource scaling) and exactly 1 thing that registers as v1beta2.custom.metrics.k8s.io on the API server. This good news, as it means we don't have to worry about screwing up users resource-scaling solutions and any custom metrics adapter we'd write does not need to handle CPU and memory.

Since there can only be one v1beta2.custom.metrics.k8s.io server we are going to have meet users where they are at and provide them a solution if they have nothing registered on v1beta2.custom.metrics.k8s.io already.

If a user is already using prometheus, there is a custom metrics server called the Prometheus Adapter which they can use with Refinery's prometheus metrics.

VinozzZ added this to the Dynamic scaling milestone Apr 16, 2024

VinozzZ added the type: enhancement New feature or request label Apr 16, 2024

kentquirk removed this from the Dynamic scaling milestone Jul 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable HPA-compatibility #1087

Enable HPA-compatibility #1087

VinozzZ commented Apr 16, 2024

VinozzZ commented Apr 16, 2024

VinozzZ commented Apr 16, 2024

TylerHelmuth commented Oct 10, 2024

Enable HPA-compatibility #1087

Enable HPA-compatibility #1087

Comments

VinozzZ commented Apr 16, 2024

VinozzZ commented Apr 16, 2024

VinozzZ commented Apr 16, 2024

TylerHelmuth commented Oct 10, 2024