Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can not create container when systemd-resolved not running #11810

Open
vaclavskala opened this issue Dec 18, 2024 · 0 comments · May be fixed by #11813
Open

Can not create container when systemd-resolved not running #11810

vaclavskala opened this issue Dec 18, 2024 · 0 comments · May be fixed by #11813
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@vaclavskala
Copy link

vaclavskala commented Dec 18, 2024

What happened?

Playbook cluster.yml crash on Kubeadm | Create kubeadm config when systemd-resolved is not running because /run/systemd/resolve/resolv.conf file is missing.

What did you expect to happen?

Kubespray will configure kubelet to use /etc/resolv.conf instead of missing /run/systemd/resolve/resolv.conf,

How can we reproduce it (as minimally and precisely as possible)?

Run cluster.yml on kube nodes running Ubuntu 24.04 with systemd-resolved masked.

OS

Linux 6.1.113-zfs226 x86_64
PRETTY_NAME="Ubuntu 24.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04.1 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

Version of Ansible

ansible [core 2.16.14]
  config file = /var/home/kubespray-2.26.0/ansible.cfg
  configured module search path = ['/var/home/kubespray-2.26.0/library']
  ansible python module location = /var/home/kubespray-2.26.0/venv/lib/python3.12/site-packages/ansible
  ansible collection location = /var/home/ansible/collections:/usr/share/ansible/collections
  executable location = /var/home/kubespray-2.26.0/venv/bin/ansible
  python version = 3.12.3 (main, Nov  6 2024, 18:32:19) [GCC 13.2.0] (/var/home/kubespray-2.26.0/venv/bin/python)
  jinja version = 3.1.4
  libyaml = True

Version of Python

Python 3.12.3

Version of Kubespray (commit)

kubespray-2.26.0

Network plugin used

calico

Full inventory with variables

Default kubespray-2.26.0 variables

Command used to invoke ansible

ansible-playbook -i inventory/cluster/inventory.ini cluster.yml

Output of ansible run

TASK [kubernetes/control-plane : Kubeadm | Create kubeadm config] **************************************************************************************************************************************************************************
changed: [XXX-prod-master1]
changed: [XXX-prod-master2]
changed: [XXX-prod-master3]
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.492)       0:07:22.794 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.043)       0:07:22.837 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.047)       0:07:22.885 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.041)       0:07:22.927 ****** 
Tuesday 17 December 2024  12:38:03 +0100 (0:00:00.048)       0:07:22.976 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.090)       0:07:23.067 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.100)       0:07:23.167 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.054)       0:07:23.221 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.044)       0:07:23.266 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.048)       0:07:23.315 ****** 
Tuesday 17 December 2024  12:38:04 +0100 (0:00:00.055)       0:07:23.370 ****** 
FAILED - RETRYING: [XXX-prod-master1]: Kubeadm | Initialize first master (3 retries left).
FAILED - RETRYING: [XXX-prod-master1]: Kubeadm | Initialize first master (2 retries left).
FAILED - RETRYING: [XXX-prod-master1]: Kubeadm | Initialize first master (1 retries left).

Anything else we need to know

Problem is that in roles/kubernetes/preinstall/tasks/main.yml there is detection if systemd-resolved is running but it is only used to detect if include 0060-resolvconf.yml or 0061-systemd-resolved.yml.

But in roles/kubernetes/node/tasks/facts.yml is included OS specific var file from roles/kubernetes/node/vars and in that file resolvconf path is hardcoded for most distributions to /run/systemd/resolve/resolv.conf.
And it cause kubelet fail to create any container.

On control-plane servers this cause kubelet can not create any container with error:
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267321   25126 dns.go:284] "Could not open resolv conf file." err="open /run/systemd/resolve/resolv.conf: no such file or directory"
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267332   25126 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-controller-manager-XXX-prod-master1"
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267342   25126 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-controller-manager-XXX-prod-master1"
Dec 17 14:01:15 XXX-prod-master1 kubelet[25126]: E1217 14:01:15.267361   25126 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-controller-manager-XXX-prod-master1_kube-system(bce3ce42e0aef110c5773ef4027de42c)\" with CreatePodSandboxError: \"Failed to generate sandbox config for pod \\\"kube-controller-manager-XXX-prod-master1_kube-system(bce3ce42e0aef110c5773ef4027de42c)\\\": open /run/systemd/resolve/resolv.conf: no such file or directory\"" pod="kube-system/kube-controller-manager-XXX-prod-master1" podUID="bce3ce42e0aef110c5773ef4027de42c"

When systemd-resolved is not running on worker nodes, any container is stuck in ContainerCreating state with error:

Events:
  Type     Reason                  Age                   From               Message
  ----     ------                  ----                  ----               -------
  Normal   Scheduled               5m52s                 default-scheduler  Successfully assigned kube-system/kube-proxy-6hnnc to XXX-prod-worker2
  Warning  FailedCreatePodSandBox  44s (x26 over 5m52s)  kubelet            Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory
@vaclavskala vaclavskala added the kind/bug Categorizes issue or PR as related to a bug. label Dec 18, 2024
@VannTen VannTen linked a pull request Dec 18, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant