Discussion: custom ansible strategy for rolling update of nodes #10497
Labels
kind/feature
Categorizes issue or PR as related to a new feature.
lifecycle/frozen
Indicates that an issue or PR should not be auto-closed due to staleness.
What would you like to be added:
A custom ansible strategy plugin, based on the
host_pinned
strategy, which would be used in the node kubelet upgrade play (and possibly other plays dealing with all the nodes). Described in ansible/ansible#81736 more precisely.Why is this needed:
serial
allows a batch upgrade rather than a rolling upgrade, even if we were to usehost_pinned
with the current play. (host_pinned works only for the current batch as defined by serial). A true rolling upgrade would instead start the play for another node as soon as one has completed it.Consider the following scenarios (which is not hypothetic, we have clusters doing that):
Scenario A:
We have some pods in the cluster with a long start time (15-30 min), which are constrained (with labels) to a particular set of nodes S. These pods have PodDisruptionPolicy to avoid loosing the service (during cluster upgrade, notably). Other pods have more typical startup time (<10s).
Once the first or second batch of nodes are upgraded, some of pods with long start time are at their minimal count according to the PodDisruptionPolicy. Which means when we try to upgrade a node in S in another batch , hosting some of those pods, it blocks for a long time while waiting for the other pods to start before it can safely drain the node (which is good). However, all of the other nodes in the batch are essentially finished with their upgrades, and we wait for nothing.
Scenario B (worse):
Two or more nodes in S are in the same batch. The first successfully drains, but not the second (because the PodDisprutionPolicy is now at the minimum number of pods acceptable). This results in a stuck upgrade, because the first node is waiting on the second to complete the task. If it wasn't, it could complete the upgrade, become Schedulable again, allowing the cluster to place new pods and make room for the second done drain. -> this would be solved simply by changing the strategy to
host_pinned
IMO.Point 2. is in my opinion the more critical to Kubespray performance in scenarios like those I described, but it implies 1.
I raised this issue on Ansible github, the devel mailing list, and matrix, but I didn't get much responses besides the automated issue closure.
I would rather have this in ansible itself, and use it in Kubespray. However, if upstream is not interested, what would you think of integrating this in kubespray ? Is the maintenance worth the (presumed, I haven't tested this concretely) perf uplift ?
(I can implement this myself, whether by copying the free strategy with some tweaks or starting from scratch).
The text was updated successfully, but these errors were encountered: