-
Notifications
You must be signed in to change notification settings - Fork 92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable HPA-compatibility #1087
Comments
Discovery so far from @kentquirk :
|
@TylerHelmuth was able to make Refinery autoscaling up and down using stress_level via the Prometheus Adapter (which requires a Prometheus instance). This means we can, with no extra code/executables, provide anyone who does not already have a custom metric server running on their cluster a solution to auto-scale refinery with stress_level. |
Kubernetes allows you to install the metrics-server (which handles resource scaling) and exactly 1 thing that registers as Since there can only be one If a user is already using prometheus, there is a custom metrics server called the Prometheus Adapter which they can use with Refinery's prometheus metrics. |
Use stress level as an indicator of when Refinery should scale up — when it rises too high for more than a little while, we should add new refinery capacity.
Unfortunately, it is designed to sit close to zero when a refinery is normally loaded as well as when it’s underloaded. The best way to think about being underloaded is time-based — if the server doesn’t pop into stress for several minutes, the cluster probably has too many pods.
While we will be adjusting which metrics are used to measure stress level in the new release, the basic logic behind how it works it will remain unchanged.
Kubernetes has HPA — Horizontal Pod Autoscaling — which normally monitors CPU and/or memory for all of the pods in a cluster. However, it can also be taught to monitor other metrics, provided there is a custom metrics API server in the cluster.
We should implement an instance of this to return stress level as a custom metric. That may be all we need to do (plus configuring k8s to use it); the k8s autoscaler is pretty smart and can be tuned for aggressiveness, so we might be able to avoid changing anything about the basic way that stress_level works.
We’ll have to build the metrics server with Refinery and include it in our release bundle, and update our Refinery Helm chart to use it.
Note that the custom metrics server could also serve metrics relating to Redis, and possibly even allow the redis part of the cluster to also be autoscaled. That’s advanced mode for another day, but the option is there.
The text was updated successfully, but these errors were encountered: