nfd-topology-updater missed some cpu for Guaranteed pods #1978

AllenXu93 · 2024-12-13T06:22:09Z

What happened:
nfd-topology-updater is used for report node's NUMA and Guaranteed pods's cpu used. But it missed cpu for Guaranteed pod which pod's have any no exclusiveCPU.

Node yj-kubevirtwork-001 kubelet config single-numa-node and cpu-manager

--cpu-manager-policy=static--topology-manager-policy=single-numa-node

I have two pods in node:

root@yj-kubevirtmaster-01 ~ # kubectl get po -owide nginx-deployment-68589dd8dc-kzvzh virt-launcher-kubevirttesttest1-p545b
NAME                                    READY   STATUS    RESTARTS   AGE     IP              NODE                  NOMINATED NODE   READINESS GATES
nginx-deployment-68589dd8dc-kzvzh       1/1     Running   0          2d16h   10.106.66.199   yj-kubevirtwork-001   <none>           <none>
virt-launcher-kubevirttesttest1-p545b   3/3     Running   0          4h20m   10.106.66.177   yj-kubevirtwork-001   <none>           1/1

In cpu-manager allocated cpu in node

cat /var/lib/kubelet/cpu_manager_state | jq
{
  "policyName": "static",
  "defaultCpuSet": "1,3,24-25,48-51,53,55,76-77,100-103",
  "entries": {
    "609526ad-0de0-4543-8188-48f872576fc9": {
      "compute": "26-34,78-86"
    },
    "64dc405a-1a9c-4029-ba5f-778fe5948a10": {
      "compute": "0,2,4-9,52,54,56-61"
    },
    "8310086e-bc8b-4ab3-ac6e-ccc1129fecec": {
      "compute": "35-47,87-99"
    },
    "98f43841-5488-49d7-9f79-29fa68791316": {
      "nginx": "14-15,66-67"
    },
    "b4fd6f12-9cf5-459e-8fdb-3058a2ae22a3": {
      "compute": "10-13,16-23,62-65,68-75"
    }
  },
  "checksum": 2485504922
}

But in noderesourcetopologies CR, it only report nginx's CPU set, not report kubevirt-launcher's CPU

kubectl get noderesourcetopologies yj-kubevirtwork-001 -oyaml
apiVersion: topology.node.k8s.io/v1alpha2
attributes:
- name: topologyManagerPolicy
  value: single-numa-node
- name: topologyManagerScope
  value: container
- name: nodeTopologyPodsFingerprint
  value: pfp0v001828baea44880d152
kind: NodeResourceTopology
metadata:
  creationTimestamp: "2024-12-05T13:13:40Z"
  generation: 66528
  name: yj-kubevirtwork-001
  ownerReferences:
  - apiVersion: v1
    kind: Namespace
    name: node-feature-discovery
    uid: 4072a5ce-0634-447e-ab31-dd3f4bc7abac
  resourceVersion: "27862374"
  uid: dd26aaf4-3f38-4708-bb86-89305475a715
topologyPolicies:
- SingleNUMANodeContainerLevel
zones:
- costs:
  - name: node-0
    value: 21
  - name: node-1
    value: 10
  name: node-1
  resources:
  - allocatable: "52"
    available: "52"
    capacity: "52"
    name: cpu
  - allocatable: "201847795712"
    available: "157778243583"
    capacity: "201847795712"
    name: memory
  type: Node
- costs:
  - name: node-0
    value: 10
  - name: node-1
    value: 21
  name: node-0
  resources:
  - allocatable: "50"
    available: "46"
    capacity: "52"
    name: cpu
  - allocatable: "183845359616"
    available: "130691427327"
    capacity: "201130086400"
    name: memory
  type: Node

In nfd-topology-updater logs:

I1213 06:06:41.791096       1 nfd-topology-updater.go:235] "received updated pod resources" podResources=<
	- Containers:
	  - Name: nginx
	    Resources:
	    - Data:
	      - "66"
	      - "67"
	      - "14"
	      - "15"
	      Name: cpu
	      NumaNodeIds: null
	    - Data:
	      - "8589934592"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  Name: nginx-deployment-68589dd8dc-kzvzh
	  Namespace: default
	- Containers:
	  - Name: compute
	    Resources:
	    - Data:
	      - vhost-net172
	      Name: devices.kubevirt.io/vhost-net
	      NumaNodeIds: null
	    - Data:
	      - kvm760
	      Name: devices.kubevirt.io/kvm
	      NumaNodeIds: null
	    - Data:
	      - tun96
	      Name: devices.kubevirt.io/tun
	      NumaNodeIds: null
	    - Data:
	      - "9141485568"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  - Name: volumecontainerdisk
	    Resources:
	    - Data:
	      - "40000000"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  - Name: guest-console-log
	    Resources:
	    - Data:
	      - "60000000"
	      Name: memory
	      NumaNodeIds:
	      - 0
	  Name: virt-launcher-kubevirttesttest2-8dnfj
	  Namespace: default
 >

nfd-topology-updater not got kubevirt-launcher pod's compute container's CPU.

What you expected to happen:
nfd-topology-updater can report all pod container's exclusiveCPU.

How to reproduce it (as minimally and precisely as possible):
For example, create a pod Pod-a have 2 containers:

containers:
- name: ctr-a
  resources:
    limits:
      cpu: 5   
    requests:
      cpu: 5
- name: ctr-b
  resources:
    limits:
      cpu: 500m
    requests:
      cpu: 500m

pod is still a Guaranteed pod , and allocate exclusiveCPU by kubelet's cpu-manager, but nfd-topology-updater will not report it's CPU in topology CR.

Anything else we need to know?:
I thought this bug is cause by hasExclusiveCPUs func:

node-feature-discovery/pkg/resourcemonitor/podresourcesscanner.go

Lines 78 to 107 in 1416072

    
           func hasExclusiveCPUs(pod *corev1.Pod) bool { 
        
           	var totalCPU int64 
        
           	var cpuQuantity resource.Quantity 
        
           	for _, container := range pod.Spec.InitContainers { 
        
           		var ok bool 
        
           		if cpuQuantity, ok = container.Resources.Requests[corev1.ResourceCPU]; !ok { 
        
           			continue 
        
           		} 
        
           		totalCPU += cpuQuantity.Value() 
        
           		isInitContainerGuaranteed := hasIntegralCPUs(pod, &container) 
        
           		if !isInitContainerGuaranteed { 
        
           			return false 
        
           		} 
        
           	} 
        
           	for _, container := range pod.Spec.Containers { 
        
           		var ok bool 
        
           		if cpuQuantity, ok = container.Resources.Requests[corev1.ResourceCPU]; !ok { 
        
           			continue 
        
           		} 
        
           		totalCPU += cpuQuantity.Value() 
        
           		isAppContainerGuaranteed := hasIntegralCPUs(pod, &container) 
        
           		if !isAppContainerGuaranteed { 
        
           			return false 
        
           		} 
        
           	} 
        
           	//No CPUs requested in all the containers in the pod 
        
           	return totalCPU != 0 
        
           }

if pod has any container not request cpu as integer, it will return false.

node-feature-discovery/pkg/resourcemonitor/podresourcesscanner.go

Lines 168 to 183 in 1416072

    
           if isIntegralGuaranteed { 
        
           	cpuIDs := container.GetCpuIds() 
        
           	if len(cpuIDs) > 0 { 
        
           		var resCPUs []string 
        
           		for _, cpuID := range container.GetCpuIds() { 
        
           			resCPUs = append(resCPUs, strconv.FormatInt(cpuID, 10)) 
        
           		} 
        
           		contRes.Resources = []ResourceInfo{ 
        
           			{ 
        
           				Name: corev1.ResourceCPU, 
        
           				Data: resCPUs, 
        
           			}, 
        
           		} 
        
           	} 
        
           }

It cause nfd-topology-updater skip this pod's all containers loop for report CPU.

Environment:

Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g: cat /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Network plugin and version (if this is a network-related bug):
Others:

The text was updated successfully, but these errors were encountered:

marquiz · 2024-12-13T09:16:25Z

@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this.

@PiotrProkop what are your thoughts on this?

EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html

AllenXu93 · 2024-12-16T02:00:30Z

@AllenXu93 it is a deliberate decision/feature to only count exclusively allocated CPUs (of Guaranteed pods). Could you open up your usage scenario(s) a bit for counting also cpus in the shared pool? Maybe we could have a config option/cmdline flag to enable this.

@PiotrProkop what are your thoughts on this?

EDIT: @AllenXu93 see https://kubernetes-sigs.github.io/node-feature-discovery/stable/usage/nfd-topology-updater.html

This PR is still use exclusively allocated CPUs, it doesn't change. But exclusively allocated CPUs is not mean pod's all container has exclusively CPUs.
For example, if pod has two containers, one is 2 Core CPU request and limits, one is 500m CPU request and limits, the pod is still Guaranteed pods, and the first container is exclusively CPU by expected.
But now topo-updater will skip this pod's CPU report.

marquiz · 2024-12-16T08:59:07Z

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

ffromani · 2024-12-16T09:02:29Z

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

I think there's a good point here. Need to review the logic and count all the exclusively allocated CPUs

AllenXu93 · 2024-12-16T09:32:00Z

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.

I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).

WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU.
But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

ffromani · 2024-12-16T09:56:45Z

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.
I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).
WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported

AllenXu93 · 2024-12-16T10:00:45Z

Ah yes, @AllenXu93 I read the description too hastily, not paying attention to this detail.
I think what you describe makes sense (i.e. nfd-topology-updater should report/count exclusively allocated cpus for pods, even if some of the containers within the pod use shared cpus).
WDYT @PiotrProkop @ffromani, something we're missing here?

In fact, I use kubevirt to manager vm by k8s, all the kubevirt pods created this way. VM container can allocate Integral CPUs, there are many other containers and init-containers only request 200m CPU. But the VM container will allocate CPU in one NUMA if kubelet open single-numa-node, nfd-topology-updater miss them will cause error when schedule other pods .

I'm a bit unsure about the QoS here and I need to review (again) the rules, but I totally agree that all the containers which have exclusive CPUs allocated in guaranteed QoS pods should be reported

Of course, I learn from the docs https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/#create-a-pod-that-gets-assigned-a-qos-class-of-guaranteed

In my opinion:
Guaranteed pod is For every Container in the Pod, all have CPU / Memory limit and request, limit and request should be equal.

ffromani · 2024-12-16T10:11:01Z

@AllenXu93 you made good points, there could be an actual bug in the area you identified. We can use this issue to add more unit tests. I'll try to have a look ASAP but europe holiday season is incoming. Others are welcome to chime in, I'll surely review.

AllenXu93 added the kind/bug Categorizes issue or PR as related to a bug. label Dec 13, 2024

AllenXu93 linked a pull request Dec 13, 2024 that will close this issue

fix topology-updater cpu report #1979

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nfd-topology-updater missed some cpu for Guaranteed pods #1978

nfd-topology-updater missed some cpu for Guaranteed pods #1978

AllenXu93 commented Dec 13, 2024

marquiz commented Dec 13, 2024 •

edited

Loading

AllenXu93 commented Dec 16, 2024

marquiz commented Dec 16, 2024

ffromani commented Dec 16, 2024

AllenXu93 commented Dec 16, 2024

ffromani commented Dec 16, 2024

AllenXu93 commented Dec 16, 2024

ffromani commented Dec 16, 2024

nfd-topology-updater missed some cpu for Guaranteed pods #1978

nfd-topology-updater missed some cpu for Guaranteed pods #1978

Comments

AllenXu93 commented Dec 13, 2024

marquiz commented Dec 13, 2024 • edited Loading

AllenXu93 commented Dec 16, 2024

marquiz commented Dec 16, 2024

ffromani commented Dec 16, 2024

AllenXu93 commented Dec 16, 2024

ffromani commented Dec 16, 2024

AllenXu93 commented Dec 16, 2024

ffromani commented Dec 16, 2024

marquiz commented Dec 13, 2024 •

edited

Loading