Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

curl metrics issue #859

Open
kunshen1 opened this issue Jul 4, 2022 · 0 comments
Open

curl metrics issue #859

kunshen1 opened this issue Jul 4, 2022 · 0 comments

Comments

@kunshen1
Copy link

kunshen1 commented Jul 4, 2022

Describe the bug

Use curl --silent http://localhost:8891/metrics to fetch cri-rm balloon information but it failed to get new create
balloon information.
Expected behavior

[root@PUE01 Mon Jul 04 16:31:19 ~]# curl --silent http://localhost:8891/metrics
# HELP balloons CPUs
# TYPE balloons gauge
balloons{balloon="default[0]",balloon_type="default",containers="",cpu_class="",cpus="0,35",cpus_max="0",cpus_min="0",mems="0",tot_req_millicpu="0"} 2
balloons{balloon="reserved[0]",balloon_type="reserved",containers="",cpu_class="",cpus="0,35",cpus_max="0",cpus_min="0",mems="0",tot_req_millicpu="0"} 2

(base) [root@PUE02 ~]# curl --silent http://localhost:8891/metric
<a href="/ui/index.html">Found.

CRI-RM logs

Jul 04 16:35:33 PUE02 cri-resmgr[3236]: D: [ cpu ] enforcing cpu frequency limits {2000000, 3600000} from class "turbo" on [1 2 81 82]
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: D: [ cpu ] no uncore frequency limits for cpu package/die 0/0
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: D: [ cpu ] no uncore frequency limits for cpu package/die 1/0
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: D: [ cpu ] cpu controller configured
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] syncing controllers with configuration...
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller blockio: running
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller cpu: running
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller cri: running
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller memory: running
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller page-migration: running
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: E: [resource-control] controller rdt: failed to start: rdt: failed to initialize RDT controls: failed to detect resctrl mount point: resctrl not found in /proc/mounts
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: W: [resource-control] disabling rdt, failed to start: rdt: failed to initialize RDT controls: failed to detect resctrl mount point: resctrl not found in /proc/mounts
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller blockio is now running, mode relaxed
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller cpu is now running, mode relaxed
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller cri is now running, mode relaxed
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller memory is now running, mode relaxed
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller page-migration is now running, mode relaxed
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: I: [resource-control] controller rdt is now inactive, mode disabled
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: W: [resource-manager] setConfig: skipping container descheduler-27615396-t8hxj:descheduler (in state 2)
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: W: [resource-manager] setConfig: skipping container descheduler-27615400-9l7mr:descheduler (in state 2)
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: E: [resource-control] memory post-update hook failed: cgroup memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb151fac1_eefd_4488_8803_bef7f880de74.slice/cri-containerd-7b56c16ff937d7e9aa6afbdd39da855b9ebb092c7911806857b274132dff2fd8.scope: "memory.toptier_soft_limit_in_bytes": failed to open: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb151fac1_eefd_4488_8803_bef7f880de74.slice/cri-containerd-7b56c16ff937d7e9aa6afbdd39da855b9ebb092c7911806857b274132dff2fd8.scope/memory.toptier_soft_limit_in_bytes: no such file or directory
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: W: [resource-manager] setConfig: skipping container cilium-2txnp:clean-cilium-state (in state 2)
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: E: [resource-control] memory post-update hook failed: cgroup memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0d6cb56f_62ff_416d_bd8f_264ee31b43b3.slice/cri-containerd-466b5fe2b3b231fd49ef4cdb9b73dcee1a65315be515352e1cd809e47659faeb.scope: "memory.toptier_soft_limit_in_bytes": failed to open: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-pod0d6cb56f_62ff_416d_bd8f_264ee31b43b3.slice/cri-containerd-466b5fe2b3b231fd49ef4cdb9b73dcee1a65315be515352e1cd809e47659faeb.scope/memory.toptier_soft_limit_in_bytes: no such file or directory
Jul 04 16:35:33 PUE02 cri-resmgr[3236]: E: [resource-control] memory post-update hook failed: cgroup memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb35bb0de_1b87_4848_b828_5205709b28ee.slice/cri-containerd-7381d7031e6efa14e07a41148af4ddf51fe88945251dc231056862ee438a2bdd.scope: "memory.toptier_soft_limit_in_bytes": failed to open: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-podb35bb0de_1b87_4848_b828_5205709b28ee.slice/cri-containerd-7381d7031e6efa14e07a41148af4ddf51fe88945251dc231056862ee438a2bdd.scope/memory.toptier_soft_limit_in_bytes: no such file or directory
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: E: [resource-control] memory post-update hook failed: cgroup memory/kubepods.slice/kubepods-pod63569257_5943_4c00_a08e_8460508d71ce.slice/cri-containerd-73a365a577b5f0682322ee20929e1ece24c3c2ce6ffacb358d6ccd2472d5dc3b.scope: "memory.toptier_soft_limit_in_bytes": failed to open: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod63569257_5943_4c00_a08e_8460508d71ce.slice/cri-containerd-73a365a577b5f0682322ee20929e1ece24c3c2ce6ffacb358d6ccd2472d5dc3b.scope/memory.toptier_soft_limit_in_bytes: no such file or directory
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: E: [resource-control] memory post-update hook failed: cgroup memory/kubepods.slice/kubepods-pod7ba817d8_dd24_4259_b254_fab4f051e2af.slice/cri-containerd-d2c8311c6f71db2080cca21065fe4a1a8ee7cfc1f8c17b027353460c20879490.scope: "memory.toptier_soft_limit_in_bytes": failed to open: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod7ba817d8_dd24_4259_b254_fab4f051e2af.slice/cri-containerd-d2c8311c6f71db2080cca21065fe4a1a8ee7cfc1f8c17b027353460c20879490.scope/memory.toptier_soft_limit_in_bytes: no such file or directory
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: E: [resource-control] memory post-update hook failed: cgroup memory/kubepods.slice/kubepods-pod11612481_fac7_4509_b9fb_7da27148c15a.slice/cri-containerd-d0955b9925756dfc9b82dc4092dc3abffb1c1acd1f551936224a9cff15085a1a.scope: "memory.toptier_soft_limit_in_bytes": failed to open: open /sys/fs/cgroup/memory/kubepods.slice/kubepods-pod11612481_fac7_4509_b9fb_7da27148c15a.slice/cri-containerd-d0955b9925756dfc9b82dc4092dc3abffb1c1acd1f551936224a9cff15085a1a.scope/memory.toptier_soft_limit_in_bytes: no such file or directory
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: W: [resource-manager] setConfig: skipping container cilium-2txnp:ebpf-mount (in state 2)
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: W: [resource-manager] setConfig: skipping container descheduler-27615398-86c2r:descheduler (in state 2)
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: I: [resource-manager] successfully switched to new configuration
Jul 04 16:35:34 PUE02 cri-resmgr[3236]: E: [ metrics ] failed to poll raw metrics: metrics: failed to poll raw metrics: 56 error(s) occurred:
Jul 04 16:35:35 PUE02 cri-resmgr[3236]: E: [ metrics ] failed to poll raw metrics: metrics: failed to poll raw metrics: 56 error(s) occurred:
Jul 04 16:35:36 PUE02 cri-resmgr[3236]: E: [ metrics ] failed to poll raw metrics: metrics: failed to poll raw metrics: 56 error(s) occurred:

  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"c32f58e7185a67428a4d0ae2892c162234deaf29afa98814ce86a56b6c796f9b" > label:<name:"size" value:"1GB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"c32f58e7185a67428a4d0ae2892c162234deaf29afa98814ce86a56b6c796f9b" > label:<name:"size" value:"1GB" > label:<name:"type" value:"MaxBytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"c32f58e7185a67428a4d0ae2892c162234deaf29afa98814ce86a56b6c796f9b" > label:<name:"size" value:"2MB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"c32f58e7185a67428a4d0ae2892c162234deaf29afa98814ce86a56b6c796f9b" > label:<name:"size" value:"2MB" > label:<name:"type" value:"MaxBytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"75784655320cadde44c8e742c77ce5d94f1778806878f962e327500981a7d8ce" > label:<name:"size" value:"1GB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"75784655320cadde44c8e742c77ce5d94f1778806878f962e327500981a7d8ce" > label:<name:"size" value:"1GB" > label:<name:"type" value:"MaxBytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"75784655320cadde44c8e742c77ce5d94f1778806878f962e327500981a7d8ce" > label:<name:"size" value:"2MB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"75784655320cadde44c8e742c77ce5d94f1778806878f962e327500981a7d8ce" > label:<name:"size" value:"2MB" > label:<name:"type" value:"MaxBytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"ecc8709a53f0f8f842d81f225ef76adc6d68c9cb0aceb0361d973f144915632b" > label:<name:"size" value:"1GB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"ecc8709a53f0f8f842d81f225ef76adc6d68c9cb0aceb0361d973f144915632b" > label:<name:"size" value:"1GB" > label:<name:"type" value:"MaxBytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"ecc8709a53f0f8f842d81f225ef76adc6d68c9cb0aceb0361d973f144915632b" > label:<name:"size" value:"2MB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"ecc8709a53f0f8f842d81f225ef76adc6d68c9cb0aceb0361d973f144915632b" > label:<name:"size" value:"2MB" > label:<name:"type" value:"MaxBytes" > gauge:<value:0 > } was collected before with the same name and label values
  • collected metric "cgroup_hugetlb_usage" { label:<name:"container_id" value:"697d88754ea44570670ecba71517b7e73d7a02f3fd1f759ea4b0c96be6fa471e" > label:<name:"size" value:"1GB" > label:<name:"type" value:"Bytes" > gauge:<value:0 > } was collected before with the same name and label values

To Reproduce

Environment

OS: Centos 8.4
CR: containerd
Kubernetes version: 1.23.5 & 1.24.2

Additional context

askervin added a commit to askervin/cri-resource-manager that referenced this issue Dec 23, 2022
- Hugetlb statistics collector matched both usage and (later added)
  reservation accounting files (*.rsvd.*). When both were present, the
  collector tried to create duplicate metrics entries: usage and
  reservation values got the same name and labels.
- This patch ignores *.rsvd.* files when collecting statistics
  in order to keep reporting exactly what was reported before them.
- Fixes error "cgroup_hugetlb_usage... collected before with the same
  name and label values" error in cri-resmgr output.
- Fixes issue intel#859.
marquiz pushed a commit to marquiz/cri-resource-manager that referenced this issue Feb 21, 2023
- Hugetlb statistics collector matched both usage and (later added)
  reservation accounting files (*.rsvd.*). When both were present, the
  collector tried to create duplicate metrics entries: usage and
  reservation values got the same name and labels.
- This patch ignores *.rsvd.* files when collecting statistics
  in order to keep reporting exactly what was reported before them.
- Fixes error "cgroup_hugetlb_usage... collected before with the same
  name and label values" error in cri-resmgr output.
- Fixes issue intel#859.

(cherry picked from commit b9e381c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant