Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kubeadm join not connected to specified apiservice #3079

Closed
limylily opened this issue Jun 26, 2024 · 6 comments
Closed

Kubeadm join not connected to specified apiservice #3079

limylily opened this issue Jun 26, 2024 · 6 comments
Labels
kind/support Categorizes issue or PR as a support question.

Comments

@limylily
Copy link

Versions

kubeadm version (use kubeadm version):

kubeadm version: &version.Info{Major:"1", Minor:"29", GitVersion:"v1.29.5", GitCommit:"59755ff595fa4526236b0cc03aa2242d941a5171", GitTreeState:"clean", BuildDate:"2024-05-14T10:44:51Z", GoVersion:"go1.21.9", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • Kubernetes version (use kubectl version):
Client Version: v1.29.5
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.29.4
  • Cloud provider or hardware configuration:
    Bare Metal Server
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
  • Kernel (e.g. uname -a):
Linux changsha-master02 5.4.0-186-generic #206-Ubuntu SMP Fri Apr 26 12:31:10 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
  • Container runtime (CRI) (e.g. containerd, cri-o):
    docker, cri-docker
  • Container networking plugin (CNI) (e.g. Calico, Cilium):
    calico
  • Others:
    Highly available virtual IP provided through kube vip

What happened?

When I used the kubeadm join command to join the control panel, it did not send a request to the virtual IP to obtain the configuration and mistakenly sent a request to the node to be joined
virtual IP:10.10.2.243 node:10.10.2.192
I originally did not use a virtual IP for a single control panel, but later added a virtual IP to add multiple control panels

kubeadm join 10.10.2.243:6443 --token 8fpq7x.pak0z6qw5woh156r --discovery-token-ca-cert-hash sha256:8fc5d90922c8b6b5d9851a280c5ee50a07b284ee6d6e7cb481f2c6ee874d7042 --apiserver-advertise-address 10.10.2.192 --apiserver-bind-port 6443 --control-plane --node-name changsha-master01 --cri-socket unix:///var/run/cri-dockerd.sock --certificate-key a231bbfceaef39a7f6f5cbfaa9e0a45b28d0146d1869f374eaa07814f901e602
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://10.10.2.192:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 10.10.2.192:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher

Here are the details

kubeadm join 10.10.2.243:6443 --token 9s9wem.rtrd6h045qnswtfh --discovery-token-ca-cert-hash sha256:8fc5d90922c8b6b5d9851a280c5ee50a07b284ee6d6e7cb481f2c6ee874d7042 --apiserver-advertise-address 10.10.2.192 --apiserver-bind-port 6443 --control-plane --node-name changsha-master01 --cri-socket unix:///var/run/cri-dockerd.sock -v 5
[preflight] Running pre-flight checks
I0626 11:59:46.027967    4077 preflight.go:93] [preflight] Running general checks
I0626 11:59:46.030949    4077 checks.go:280] validating the existence of file /etc/kubernetes/kubelet.conf
I0626 11:59:46.030991    4077 checks.go:280] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0626 11:59:46.031011    4077 checks.go:104] validating the container runtime
I0626 11:59:46.066893    4077 checks.go:639] validating whether swap is enabled or not
I0626 11:59:46.067032    4077 checks.go:370] validating the presence of executable crictl
I0626 11:59:46.067087    4077 checks.go:370] validating the presence of executable conntrack
I0626 11:59:46.067117    4077 checks.go:370] validating the presence of executable ip
I0626 11:59:46.067141    4077 checks.go:370] validating the presence of executable iptables
I0626 11:59:46.067166    4077 checks.go:370] validating the presence of executable mount
I0626 11:59:46.067189    4077 checks.go:370] validating the presence of executable nsenter
I0626 11:59:46.067212    4077 checks.go:370] validating the presence of executable ebtables
I0626 11:59:46.067239    4077 checks.go:370] validating the presence of executable ethtool
I0626 11:59:46.067261    4077 checks.go:370] validating the presence of executable socat
I0626 11:59:46.067284    4077 checks.go:370] validating the presence of executable tc
I0626 11:59:46.067305    4077 checks.go:370] validating the presence of executable touch
I0626 11:59:46.067328    4077 checks.go:516] running all checks
I0626 11:59:46.081491    4077 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost
I0626 11:59:46.081903    4077 checks.go:605] validating kubelet version
I0626 11:59:46.148757    4077 checks.go:130] validating if the "kubelet" service is enabled and active
I0626 11:59:46.164437    4077 checks.go:203] validating availability of port 10250
I0626 11:59:46.164638    4077 checks.go:430] validating if the connectivity type is via proxy or direct
I0626 11:59:46.164682    4077 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0626 11:59:46.164741    4077 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0626 11:59:46.164782    4077 join.go:532] [preflight] Discovering cluster-info
I0626 11:59:46.164822    4077 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "10.10.2.243:6443"
I0626 11:59:46.175968    4077 token.go:118] [discovery] Requesting info from "10.10.2.243:6443" again to validate TLS against the pinned public key
I0626 11:59:46.185048    4077 token.go:135] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.10.2.243:6443"
I0626 11:59:46.185102    4077 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0626 11:59:46.185119    4077 join.go:546] [preflight] Fetching init configuration
I0626 11:59:46.185137    4077 join.go:592] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
Get "https://10.10.2.192:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp 10.10.2.192:6443: connect: connection refused
failed to get config map
k8s.io/kubernetes/cmd/kubeadm/app/util/config.getInitConfigurationFromCluster
        k8s.io/kubernetes/cmd/kubeadm/app/util/config/cluster.go:75
k8s.io/kubernetes/cmd/kubeadm/app/util/config.FetchInitConfigurationFromCluster
        k8s.io/kubernetes/cmd/kubeadm/app/util/config/cluster.go:56
k8s.io/kubernetes/cmd/kubeadm/app/cmd.fetchInitConfiguration
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:623
k8s.io/kubernetes/cmd/kubeadm/app/cmd.fetchInitConfigurationFromJoinConfiguration
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:593
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).InitCfg
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:547
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runPreflight
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/preflight.go:98
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
        k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        runtime/proc.go:267
runtime.goexit
        runtime/asm_amd64.s:1650
unable to fetch the kubeadm-config ConfigMap
k8s.io/kubernetes/cmd/kubeadm/app/cmd.fetchInitConfiguration
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:625
k8s.io/kubernetes/cmd/kubeadm/app/cmd.fetchInitConfigurationFromJoinConfiguration
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:593
k8s.io/kubernetes/cmd/kubeadm/app/cmd.(*joinData).InitCfg
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:547
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join.runPreflight
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/join/preflight.go:98
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:259
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
        k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        runtime/proc.go:267
runtime.goexit
        runtime/asm_amd64.s:1650
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:260
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:446
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:232
k8s.io/kubernetes/cmd/kubeadm/app/cmd.newCmdJoin.func1
        k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:180
github.com/spf13/cobra.(*Command).execute
        github.com/spf13/[email protected]/command.go:940
github.com/spf13/cobra.(*Command).ExecuteC
        github.com/spf13/[email protected]/command.go:1068
github.com/spf13/cobra.(*Command).Execute
        github.com/spf13/[email protected]/command.go:992
k8s.io/kubernetes/cmd/kubeadm/app.Run
        k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
        k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
        runtime/proc.go:267
runtime.goexit
        runtime/asm_amd64.s:1650

What you expected to happen?

I hope it should obtain configuration information through virtual IP, rather than through nodes that have not yet been joined

How to reproduce it (as minimally and precisely as possible)?

Deploy a single control panel k8s cluster without using virtual IP first, and then upgrade it to a k8s cluster with multiple control panels

@neolit123
Copy link
Member

the machines must have network connectivity between each other.
this seems like a network problem and not a kubeadm problem.

please check the support link the bot will share below.

/support

Copy link

Hello, @limylily 🤖 👋

You seem to have troubles using Kubernetes and kubeadm.
Note that our issue trackers should not be used for providing support to users.
There are special channels for that purpose.

Please see:

@github-actions github-actions bot added the kind/support Categorizes issue or PR as a support question. label Jun 26, 2024
@blind3dd
Copy link

Hello, I do not think it is networking issue, but rather a security issue but just because there's an error there, it could be considered as a security feature in this scenario where it fails to continue. if it is not, then I think, it simply is something that requires little bit of effort of going over cobra parameters, overrides and overall Kubeadm configuration related logic whether Cluster Info Config or KubeConfig or Kubelet admin confs values are read from file, CMs coming from the ControlPlane Endpoints or other. Unless SAN presence with 127.0.0.1 in apiServer certSANs in initial configuration of kubeadm and kubeadm-admin may have something to do with it, hence blocking the init phases even though they are completed (I think this is a conversation for another time though, or even a subject for another ticket).

Regarding the problem mentioned by the author of this ticket, I have experienced very same issue - almost analogical.

My setup - I have LimaVM kubernetes Control Plane successfully provisioned with kubeadm.k8s.io/v1beta4, but I run only one this one control plane (as of now) which LimaVM GuestOS with network type configured to be user-v2 hosts.

That kind of a network configuration I have, assures connectivity among nodes and indeed, I can easily communicate through ICMP and not only with the other node that I am trying to get to be a Worker or another Control Plane if needed (mostly trying to run kubeadm join command and attempting to have this worker successfully connected with kube-apiserver by calling proper ip looking for the next CMap data from the proper endpoint).

I run Ubuntu 24.04 image which is powered by vmType: vz (Apple Virtualization.framework) rather than Qemu which I usually go with, but it seems to work very well with "vz". I also run everything with cgroup2fs based on non tmpfs systemd entries and containerd is used as CRI and even as a client even though there's also a docker deamon on the LimaVM installed as well. Flannel is the CNI and it also provides with KubeDNS.

I can access kube-api from my HostOS without an issue as well. I simply try to run Kubeadm join with token and sha256 but it only get to the certain point, failing on second ConfigMap (roundtripper.go).

Problem I noticed is that after issuing Kubeadm join command, and after successfully having init phases started and auth validated TLS against the pinned public key during first requests to get data from CM, it gets that data but then after it finishes first calls, it continues on and sets "server" (which is https://127.0.0.1:8443) based on the response from request.go and the logic and that entry in kube-public namespaced cluster-info CMap. Seems that after making first calls with request.go it sets that config rather than updating it.

It gets this data starting with proper address when worker node tries to finish init phases and client starts getting more data after request.go, from another ConfigMap in the control-plane node - this time it's kubeadm-conf from kube-system namespace of the control plane it tries to finish calling and reading from, roundtripper starts looking for mentioned ConfigMap again and again and eventually fails with ugly "stacktrace" and "failed to get config map" - it certainly won't be able to reach https://127.0.0.1:6443.

I found that a reason for it is that ConfigMap called cluster-info from kube-public namespace contains a key, value being server: https://127.0.0.1:6443, while I am still trying to continue process of joining to be worker with ip of 192.168.104.6 into 192.168.104.5 (server https://192.168.104.5:6443). It kind of get the server data from cluster-info first and "swaps" the ip address originally provided through cli with "kubeadm join" command.

Client.go and roundtripper , as it repeats as per code logic and eventually hit the limit logging stack trace and invalid message about ConfigMap - this time kubeadm-conf which it looks for with wrong ip address even though I went with

I1126 11:30:02.341661 2179335 token.go:134] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "192.168.104.5:6443"
I1126 11:30:02.341673 2179335 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process

My Cluster

root@lima-vz-control:/Users/usualsuspectx/Development/virtualLima# k get no -A -owide
NAME              STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
lima-vz-control   Ready    control-plane   14h   v1.31.2   192.168.104.5   <none>        Ubuntu 20.04.6 LTS   5.4.0-200-generic   containerd://1.7.16

@neolit123 I woud like to work on this issue since I have already written code for it (mainly in kubeconfig.go for now) and I have been already involved in debugging this for quite a time now.

I confirm it is the issue, and I would like to start being full part of Kubeadm maintainers.

@neolit123
Copy link
Member

I found that a reason for it is that ConfigMap called cluster-info from kube-public namespace contains a key, value being server

that's how kubeadm works. you can find more about it by checking the source code under cmd/kubeadm/app/discovery.
but if the cluster-info has the wrong IP then it means it was given to kubeadm on "init".

to fix that you cannot manually edit the cluster-info CM, because it uses JWS signatures for a given token and for the kubeconfig in the CM. like this jws-kubeconfig-20fp2p: eyJhbGciOiJIUzI1NiIsImtpZCI6IjIwZnAycCJ9..VsHtz4_pKIaI2oYNgqrMREzR5C58Y4qI2Th2VKAzXBk

you can do this:

sudo kubeadm init phase bootstraptoken --config <make-sure-you-pass-correct-config>

and then delete/create tokens with kubeadm token

@neolit123 I woud like to work on this issue since I have already written code for it (mainly in kubeconfig.go for now) and I have been already involved in debugging this for quite a time now.

there is no bug to fix here. this is misconfiguration during "init".

also please note we don't provide support on github anymore:
https://github.com/kubernetes/kubeadm?tab=readme-ov-file#support

@blind3dd
Copy link

blind3dd commented Nov 29, 2024

@neolit123 Thanks for your answer and a tip regarding support.

Thing is, I have provided proper ip address during initial configuration and yet it changes during the lookup for the second ConfigMap that resides in ControlPlane when kubeadm join is issued after reading the body of first response (which contains that 127.0.0.1)

I have been into the code of the Kubernetes and working on GetClusterFromKubeConfig method from kubeconfig.go (cmd/kubeadm/app/util/kubeconfig/kubeconfig.go) but it's stashed for now since there's something there I want to test more with some local changes, local setup and scenarios getting into kubeadm, so I am aiming at testing my build of kubeadm with some changes there.

Regarding the issue, It is possible however, that I initially created the control plane with proper ip, and it was running fine but then I had to do a reset and again with proper args to have this control Plane node kube-api working. Perhaps something there in between of initial start, reset to get the data for joiner, or another init got weird.

Anyway, I am going to get deeper into this in the meaning of making sure it all works with clear setup recreating the environment so I can also be sure the lima template is properly written (I plan to create a PR there that would be kubeadm related for Debian or update if there's already one committed - I work with newest v4 as mentioned in prev comment).

Coming back to kubeadm, and since I am working on lima template for k8s anyway, I will be checking it on other machines, but something tells me there may be a corner case here.

I suspect that it can have something to do with the fact I have initially created a control plane on the node that is now supposed to join to another control plane because then I used kubeadm init with defaults, but then I reset and removed manifests var lib data etc and so on... so I plan on using another node without any config present for either control plane init or a worker join to see what is the minimal number of steps since I plan to have 100 clear lima-vm steps for that too in the manifest (aiming at qemu due to "generic nature" to include Windows usability in general so perhaps I will need to create another vms anyway).

After I get the results and have clear comparison between setups with clear steps, then I will join next kubeadm office hours as I plan to do it anyway on Wednesday next week. cheers!

@blind3dd
Copy link

blind3dd commented Dec 4, 2024

@neolit123 you're right. No issue here, at least I could not have it reproduced it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/support Categorizes issue or PR as a support question.
Projects
None yet
Development

No branches or pull requests

3 participants