Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max pods per node when using prefix delegation #7505

Open
charlierm opened this issue Dec 9, 2024 · 4 comments
Open

Max pods per node when using prefix delegation #7505

charlierm opened this issue Dec 9, 2024 · 4 comments
Labels
bug Something isn't working lifecycle/stale triage/solved Mark the issue as solved by a Karpenter maintainer. This gives time for the issue author to confirm.

Comments

@charlierm
Copy link

charlierm commented Dec 9, 2024

Description

Observed Behavior:
Regardless of the AMI I use, the correct max pods value does not take into account that prefix delegation is enabled on the VPC-CNI. This looks to be similar to #2029

I understand that Karpenter doesn't schedule but only provisions the nodes, my issue is that in theory Karpenter could be provisioning an oversized instance?

Currently I am using a startup script along with the famous max-pods-calculator.sh, that causes the NodeClaim to mismatch the Node

Is it currently using? https://karpenter.sh/v1.1/reference/instance-types/

Furthermore this issue seems to be resolved when using EKS Auto Mode.

I can't be the only one experiencing this and am struggling to work out what the best approach is!

Expected Behavior:
Max pods is calculated correctly for the instance size and takes into account prefix delegation.

Reproduction Steps (Please include YAML):

Versions:

  • Chart Version: 1.1.0
  • Kubernetes Version (kubectl version): 1.31
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@charlierm charlierm added bug Something isn't working needs-triage Issues that need to be triaged labels Dec 9, 2024
@charlierm
Copy link
Author

It also makes me think that karpenter.k8s.aws/instance-pods isn't correct, so presumably if I added a nodeSelector to my workload of ``karpenter.k8s.aws/instance-pods=110` it would provision an node with an instance type far greater than required.

@jonathan-innis
Copy link
Contributor

Max pods is calculated correctly for the instance size and takes into account prefix delegation

You can control the maxPods that Karpenter sees by using maxPods: 110 in the kubelet section on the EC2NodeClass. Karpenter doesn't know that you are or aren't running prefix delegation right now so it can't know that the node that you are launching is going to have more IPs available then it thinks you should have without prefix mode enabled.

@jonathan-innis jonathan-innis added triage/solved Mark the issue as solved by a Karpenter maintainer. This gives time for the issue author to confirm. and removed needs-triage Issues that need to be triaged labels Dec 10, 2024
@charlierm
Copy link
Author

charlierm commented Dec 11, 2024

@jonathan-innis we're currently setting kubelet.maxPods=110, this feels more like a work around than a solution?

Even with prefix delegation there are instance types which cannot support 110, for example c6g.medium. In this situation presumably it would provision a smaller instance than is required? If you had a lot of workloads with low resource requests?

I feel like a good solution would be to have a toggle on EC2NodeClass which then calculates the maxpods correctly for each instance type.

Also want to reiterate that this problem seems to be solved with EKS Auto Mode, I know it's not pure Karpenter and the spec is different but it seems like an acknowledgement.

Copy link
Contributor

This issue has been inactive for 7 days and is marked as "triage/solved". StaleBot will close this stale issue after 7 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/stale triage/solved Mark the issue as solved by a Karpenter maintainer. This gives time for the issue author to confirm.
Projects
None yet
Development

No branches or pull requests

2 participants