Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: custom ansible strategy for rolling update of nodes #10497

Open
VannTen opened this issue Oct 4, 2023 · 3 comments
Open

Discussion: custom ansible strategy for rolling update of nodes #10497

VannTen opened this issue Oct 4, 2023 · 3 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@VannTen
Copy link
Contributor

VannTen commented Oct 4, 2023

What would you like to be added:

A custom ansible strategy plugin, based on the host_pinned strategy, which would be used in the node kubelet upgrade play (and possibly other plays dealing with all the nodes). Described in ansible/ansible#81736 more precisely.

Why is this needed:

  1. The linear strategy waits for all hosts to finish the current tasks. Unless I'm mistaken, kubelet upgrades are independant between nodes and don't need to wait. Thus we're losing time busy-waiting.
  2. Using serial allows a batch upgrade rather than a rolling upgrade, even if we were to use host_pinned with the current play. (host_pinned works only for the current batch as defined by serial). A true rolling upgrade would instead start the play for another node as soon as one has completed it.

Consider the following scenarios (which is not hypothetic, we have clusters doing that):

Scenario A:
We have some pods in the cluster with a long start time (15-30 min), which are constrained (with labels) to a particular set of nodes S. These pods have PodDisruptionPolicy to avoid loosing the service (during cluster upgrade, notably). Other pods have more typical startup time (<10s).

Once the first or second batch of nodes are upgraded, some of pods with long start time are at their minimal count according to the PodDisruptionPolicy. Which means when we try to upgrade a node in S in another batch , hosting some of those pods, it blocks for a long time while waiting for the other pods to start before it can safely drain the node (which is good). However, all of the other nodes in the batch are essentially finished with their upgrades, and we wait for nothing.

Scenario B (worse):
Two or more nodes in S are in the same batch. The first successfully drains, but not the second (because the PodDisprutionPolicy is now at the minimum number of pods acceptable). This results in a stuck upgrade, because the first node is waiting on the second to complete the task. If it wasn't, it could complete the upgrade, become Schedulable again, allowing the cluster to place new pods and make room for the second done drain. -> this would be solved simply by changing the strategy to host_pinned IMO.

Point 2. is in my opinion the more critical to Kubespray performance in scenarios like those I described, but it implies 1.
I raised this issue on Ansible github, the devel mailing list, and matrix, but I didn't get much responses besides the automated issue closure.


I would rather have this in ansible itself, and use it in Kubespray. However, if upstream is not interested, what would you think of integrating this in kubespray ? Is the maintenance worth the (presumed, I haven't tested this concretely) perf uplift ?

(I can implement this myself, whether by copying the free strategy with some tweaks or starting from scratch).

@VannTen VannTen added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 4, 2023
@VannTen
Copy link
Contributor Author

VannTen commented Nov 30, 2023

So, I thought of something which is likely to get a faster ROI: instead of trying to retro-fit an ansible strategy with the "slot" concepts, I'd use the host_pinned strategy coupled with kubernetes leases which would act as "slot reservations".
This has the advantage that it easily scale to a "slot-per-group" concept (which would natively support #10591 ) by leveraging group_vars.

Opinions welcomed !

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2024
@VannTen
Copy link
Contributor Author

VannTen commented Feb 28, 2024 via email

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
Development

No branches or pull requests

3 participants