Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fallback_ips.yml exits early when there is an unreachable host in the inventory #10993

Open
Rickkwa opened this issue Mar 11, 2024 · 6 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@Rickkwa
Copy link
Contributor

Rickkwa commented Mar 11, 2024

What happened?

This is a continuation of #10313.

When roles/kubespray-defaults/tasks/fallback_ips.yml runs on a inventory with an unreachable host, it'll exit the entire play after the setup task, with NO MORE HOSTS LEFT.

What did you expect to happen?

I expect the entire kubespray-defaults role to finish running, but it exits the play after the single task.

How can we reproduce it (as minimally and precisely as possible)?

Minimal inventory

[k8s_cluster:children]
kube_control_plane
kube_node
calico_rr

[kube_control_plane]
k8s1.local  # reachable host

[etcd]
k8s1.local  # reachable host

[kube_node]
k8s3.local  # problematic unreachable host
k8s2.local  # reachable host

[calico_rr]

And then this minimal playbook

- name: Prepare nodes for upgrade
  hosts: k8s_cluster:etcd:calico_rr
  gather_facts: False
  any_errors_fatal: "{{ any_errors_fatal | default(true) }}"
  environment: "{{ proxy_disable_env }}"
  roles:
    - { role: kubespray-defaults }

Execute with ansible-playbook -i hosts.ini bug.yml

OS

Linux 6.5.11-8-pve x86_64
NAME="AlmaLinux"
VERSION="9.3 (Shamrock Pampas Cat)"
ID="almalinux"
ID_LIKE="rhel centos fedora"
VERSION_ID="9.3"
PLATFORM_ID="platform:el9"
PRETTY_NAME="AlmaLinux 9.3 (Shamrock Pampas Cat)"
ANSI_COLOR="0;34"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:almalinux:almalinux:9::baseos"
HOME_URL="https://almalinux.org/"
DOCUMENTATION_URL="https://wiki.almalinux.org/"
BUG_REPORT_URL="https://bugs.almalinux.org/"

ALMALINUX_MANTISBT_PROJECT="AlmaLinux-9"
ALMALINUX_MANTISBT_PROJECT_VERSION="9.3"
REDHAT_SUPPORT_PRODUCT="AlmaLinux"
REDHAT_SUPPORT_PRODUCT_VERSION="9.3"

Version of Ansible

Tried both:

ansible [core 2.15.9]
  config file = /root/kubespray-test/kubespray/ansible.cfg
  configured module search path = ['/root/kubespray-test/kubespray/library']
  ansible python module location = /root/kubespray-test/venv-latest/lib64/python3.9/site-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /root/kubespray-test/venv-latest/bin/ansible
  python version = 3.9.18 (main, Jan  4 2024, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/root/kubespray-test/venv-latest/bin/python)
  jinja version = 3.1.3
  libyaml = True

and

ansible [core 2.14.14]
  config file = /root/kubespray-test/kubespray/ansible.cfg
  configured module search path = ['/root/kubespray-test/kubespray/library']
  ansible python module location = /root/kubespray-test/venv-2.14/lib64/python3.9/site-packages/ansible  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /root/kubespray-test/venv-2.14/bin/ansible
  python version = 3.9.18 (main, Jan  4 2024, 00:00:00) [GCC 11.4.1 20230605 (Red Hat 11.4.1-2)] (/root/kubespray-test/venv-2.14/bin/python)
  jinja version = 3.1.3
  libyaml = True

Version of Python

Python 3.9.18

Version of Kubespray (commit)

66eaba3

Network plugin used

calico

Full inventory with variables

See "How can we reproduce it" section. Just that inventory, no variables.

Command used to invoke ansible

See "How can we reproduce it" section

Output of ansible run

PLAY [Prepare nodes for upgrade] ********************************************************************************************************************************************************************************

TASK [kubespray-defaults : Gather ansible_default_ipv4 from all hosts] ******************************************************************************************************************************************
ok: [k8s1.local] => (item=k8s1.local)
[WARNING]: Unhandled error in Python interpreter discovery for host k8s1.local: Failed to connect to the host via ssh: ssh: connect to host k8s3.local port 22: Connection timed out
failed: [k8s1.local -> k8s3.local] (item=k8s3.local) => {"ansible_loop_var": "item", "item": "k8s3.local", "msg": "Data could not be sent to remote host \"k8s3.local\". Make sure this host can be reached over ssh: ssh: connect to host k8s3.local port 22: Connection timed out\r\n", "unreachable": true}
ok: [k8s1.local -> k8s2.local] => (item=k8s2.local)
fatal: [k8s1.local -> {{ item }}]: UNREACHABLE! => {"changed": false, "msg": "All items completed", "results": [{"ansible_facts": {"ansible_default_ipv4": {"address": "10.88.111.29", "alias": "eth0", "broadcast": "10.88.111.255", "gateway": "10.88.111.254", "interface": "eth0", "macaddress": "bc:24:11:41:88:12", "mtu": 1500, "netmask": "255.255.252.0", "network": "10.88.108.0", "prefix": "22", "type": "ether"}, "discovered_interpreter_python": "/usr/bin/python3"}, "ansible_loop_var": "item", "changed": false, "failed": false, "invocation": {"module_args": {"fact_path": "/etc/ansible/facts.d", "filter": ["ansible_default_ipv4"], "gather_subset": ["!all", "network"], "gather_timeout": 10}}, "item": "k8s1.local"}, {"ansible_loop_var": "item", "item": "k8s3.local", "msg": "Data could not be sent to remote host \"k8s3.local\". Make sure this host can be reached over ssh: ssh: connect to host k8s3.local port 22: Connection timed out\r\n", "unreachable": true}, {"ansible_facts": {"ansible_default_ipv4": {"address": "10.88.111.30", "alias": "eth0", "broadcast": "10.88.111.255", "gateway": "10.88.111.254", "interface": "eth0", "macaddress": "bc:24:11:be:42:a6", "mtu": 1500, "netmask": "255.255.252.0", "network": "10.88.108.0", "prefix": "22", "type": "ether"}, "discovered_interpreter_python": "/usr/bin/python3"}, "ansible_loop_var": "item", "changed": false, "failed": false, "invocation": {"module_args": {"fact_path": "/etc/ansible/facts.d", "filter": ["ansible_default_ipv4"], "gather_subset": ["!all", "network"], "gather_timeout": 10}}, "item": "k8s2.local"}]}
...ignoring

NO MORE HOSTS LEFT **********************************************************************************************************************************************************************************************

PLAY RECAP ******************************************************************************************************************************************************************************************************
k8s1.local                 : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=1

Anything else we need to know

In the PR #10601, it added ignore_unreachable: true. That made it so the Play Recap had ignored=1 instead of unreachable=1. But ultimately it doesn't solve the issue of the play exiting early.

@Rickkwa Rickkwa added the kind/bug Categorizes issue or PR as related to a bug. label Mar 11, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2024
@Rickkwa
Copy link
Contributor Author

Rickkwa commented Jun 9, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 7, 2024
@Rickkwa
Copy link
Contributor Author

Rickkwa commented Sep 16, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 16, 2024
@VannTen
Copy link
Contributor

VannTen commented Dec 10, 2024

Is this still relevant after #11598 ?

@Rickkwa
Copy link
Contributor Author

Rickkwa commented Dec 13, 2024

Is this still relevant after #11598 ?

I'll take a look next week to see if this is still relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants