Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nfd-topology-updater missed some cpu for Guaranteed pods #1978

Open
AllenXu93 opened this issue Dec 13, 2024 · 8 comments · May be fixed by #1979
Open

nfd-topology-updater missed some cpu for Guaranteed pods #1978

AllenXu93 opened this issue Dec 13, 2024 · 8 comments · May be fixed by #1979
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@AllenXu93
Copy link

What happened:
nfd-topology-updater is used for report node's NUMA and Guaranteed pods's cpu used. But it missed cpu for Guaranteed pod which pod's have any no exclusiveCPU.

  1. Node yj-kubevirtwork-001 kubelet config single-numa-node and cpu-manager
--cpu-manager-policy=static--topology-manager-policy=single-numa-node

I have two pods in node:

root@yj-kubevirtmaster-01 ~ # kubectl get po -owide nginx-deployment-68589dd8dc-kzvzh virt-launcher-kubevirttesttest1-p545b
NAME                                    READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
nginx-deployment-68589dd8dc-kzvzh       1/1     Running   0          2d16h   10.106.66.199   yj-kubevirtwork-001   <none>           <none>
virt-launcher-kubevirttesttest1-p545b   3/3     Running   0          4h20m   10.106.66.177   yj-kubevirtwork-001   <none>           1/1

In cpu-manager allocated cpu in node

cat /var/lib/kubelet/cpu_manager_state | jq
{
  "policyName": "static",
  "defaultCpuSet": "1,3,24-25,48-51,53,55,76-77,100-103",
  "entries": {
    "609526ad-0de0-4543-8188-48f872576fc9": {
      "compute": "26-34,78-86"
    },
    "64dc405a-1a9c-4029-ba5f-778fe5948a10": {
      "compute": "0,2,4-9,52,54,56-61"
    },
    "8310086e-bc8b-4ab3-ac6e-ccc1129fecec": {
      "compute": "35-47,87-99"
    },
    "98f43841-5488-49d7-9f79-29fa68791316": {
      "nginx": "14-15,66-67"
    },
    "b4fd6f12-9cf5-459e-8fdb-3058a2ae22a3": {
      "compute": "10-13,16-23,62-65,68-75"
    }
  },
  "checksum": 2485504922
}

But in noderesourcetopologies CR, it only report nginx's CPU set, not report kubevirt-launcher's CPU

kubectl get noderesourcetopologies yj-kubevirtwork-001 -oyaml
apiVersion: topology.node.k8s.io/v1alpha2
attributes:
- name: topologyManagerPolicy
  value: single-numa-node
- name: topologyManagerScope
  value: container
- name: nodeTopologyPodsFingerprint
  value: pfp0v001828baea44880d152
kind: NodeResourceTopology
metadata:
  creationTimestamp: "2024-12-05T13:13:40Z"
  generation: 66528
  name: yj-kubevirtwork-001
  ownerReferences:
  - apiVersion: v1
    kind: Namespace
    name: node-feature-discovery
    uid: 4072a5ce-0634-447e-ab31-dd3f4bc7abac
  resourceVersion: "27862374"
  uid: dd26aaf4-3f38-4708-bb86-89305475a715
topologyPolicies:
- SingleNUMANodeContainerLevel
zones:
- costs:
  - name: node-0
    value: 21
  - name: node-1
    value: 10
  name: node-1
  resources:
  - allocatable: "52"
    available: "52"
    capacity: "52"
    name: cpu
  - allocatable: "201847795712"
    available: "157778243583"
    capacity: "201847795712"
    name: memory
  type: Node
- costs:
  - name: node-0
    value: 10
  - name: node-1
    value: 21
  name: node-0
  resources:
  - allocatable: "50"
    available: "46"
    capacity: "52"
    name: cpu
  - allocatable: "183845359616"
    available: "130691427327"
    capacity: "201130086400"
    name: memory
  type: Node

In nfd-topology-updater logs:

I1213 06:06:41.791096       1 nfd-topology-updater.go:235] "received updated pod resources" podResources=<
	- Containers:
	  - Name: nginx
	    Resources:
	    - Data:
	      - "66"
	      - "67"
	      - "14"
	      - "15"
	      Name: cpu
	      NumaNodeIds: null
	    - Data:
	      - "8589934592"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  Name: nginx-deployment-68589dd8dc-kzvzh
	  Namespace: default
	- Containers:
	  - Name: compute
	    Resources:
	    - Data:
	      - vhost-net172
	      Name: devices.kubevirt.io/vhost-net
	      NumaNodeIds: null
	    - Data:
	      - kvm760
	      Name: devices.kubevirt.io/kvm
	      NumaNodeIds: null
	    - Data:
	      - tun96
	      Name: devices.kubevirt.io/tun
	      NumaNodeIds: null
	    - Data:
	      - "9141485568"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  - Name: volumecontainerdisk
	    Resources:
	    - Data:
	      - "40000000"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  - Name: guest-console-log
	    Resources:
	    - Data:
	      - "60000000"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  Name: virt-launcher-kubevirttesttest2-8dnfj
	  Namespace: default
 >

nfd-topology-updater not got kubevirt-launcher pod's compute container's CPU.

What you expected to happen:
nfd-topology-updater can report all pod container's exclusiveCPU.

How to reproduce it (as minimally and precisely as possible):
For example, create a pod Pod-a have 2 containers:

containers:
- name: ctr-a
  resources:
    limits:
      cpu: 5   
    requests:
      cpu: 5
- name: ctr-b
  resources:
    limits:
      cpu: 500m
    requests:
      cpu: 500m

pod is still a Guaranteed pod , and allocate exclusiveCPU by kubelet's cpu-manager, but nfd-topology-updater will not report it's CPU in topology CR.

Anything else we need to know?:
I thought this bug is cause by hasExclusiveCPUs func:

func hasExclusiveCPUs(pod *corev1.Pod) bool {
var totalCPU int64
var cpuQuantity resource.Quantity
for _, container := range pod.Spec.InitContainers {
var ok bool
if cpuQuantity, ok = container.Resources.Requests[corev1.ResourceCPU]; !ok {
continue
}
totalCPU += cpuQuantity.Value()
isInitContainerGuaranteed := hasIntegralCPUs(pod, &container)
if !isInitContainerGuaranteed {
return false
}
}
for _, container := range pod.Spec.Containers {
var ok bool
if cpuQuantity, ok = container.Resources.Requests[corev1.ResourceCPU]; !ok {
continue
}
totalCPU += cpuQuantity.Value()
isAppContainerGuaranteed := hasIntegralCPUs(pod, &container)
if !isAppContainerGuaranteed {
return false
}
}
//No CPUs requested in all the containers in the pod
return totalCPU != 0
}

if pod has any container not request cpu as integer, it will return false.

if isIntegralGuaranteed {
cpuIDs := container.GetCpuIds()
if len(cpuIDs) > 0 {
var resCPUs []string
for _, cpuID := range container.GetCpuIds() {
resCPUs = append(resCPUs, strconv.FormatInt(cpuID, 10))
}
contRes.Resources = []ResourceInfo{
{
Name: corev1.ResourceCPU,
Data: resCPUs,
},
}
}
}

It cause nfd-topology-updater skip this pod's all containers loop for report CPU.

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g: cat /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Network plugin and version (if this is a network-related bug):
  • Others:
@AllenXu93 AllenXu93 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2024
@AllenXu93 AllenXu93 linked a pull request Dec 13, 2024 that will close this issue
@marquiz
Copy link
Contributor

marquiz commented Dec 13, 2024

@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this.

@PiotrProkop what are your thoughts on this?

EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html

@AllenXu93
Copy link
Author

@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this.

@PiotrProkop what are your thoughts on this?

EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html

This PR is still use exclusively allocated CPUs, it doesn't change. But exclusively allocated CPUs is not mean pod's all container has exclusively CPUs.
For example, if pod has two containers, one is 2 Core CPU request and limits, one is 500m CPU request and limits, the pod is still Guaranteed pods, and the first container is exclusively CPU by expected.
But now topo-updater will skip this pod's CPU report.

@marquiz
Copy link
Contributor

marquiz commented Dec 16, 2024

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

@ffromani
Copy link
Contributor

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

I think there's a good point here. Need to review the logic and count all the exclusively allocated CPUs

@AllenXu93
Copy link
Author

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU.
But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

@ffromani
Copy link
Contributor

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.
I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).
WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported

@AllenXu93
Copy link
Author

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.
I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).
WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported

Of course, I learn from the docs https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed

In my opinion:
Guaranteed pod is For every Container in the Pod, all have CPU / Memory limit and request, limit and request should be equal.

@ffromani
Copy link
Contributor

@AllenXu93 you made good points, there could be an actual bug in the area you identified. We can use this issue to add more unit tests. I'll try to have a look ASAP but europe holiday season is incoming. Others are welcome to chime in, I'll surely review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants