The solution to the crash loop back off error of coredns in k8s deployment
Problem description
Before doing the project, we need to use k8s to build a cluster. I’m a novice Xiaobai, and I’m going to do it step by step according to the online building steps (refer to the link website for the deployment process)
when I check the status of each pod in the cluster, I find that coredns has not been started successfully, and has been in the crashloopback off state, falling into the dead cycle of non-stop error restart
[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-9wd9n 0/1 CrashLoopBackOff 19 19h
coredns-bccdc95cf-qsf9f 0/1 CrashLoopBackOff 19 19h
etcd-k8s-master 1/1 Running 3 19h
kube-apiserver-k8s-master 1/1 Running 3 19h
kube-controller-manager-k8s-master 1/1 Running 11 19h
kube-flannel-ds-amd64-sgqsm 1/1 Running 1 16h
kube-flannel-ds-amd64-swqhf 1/1 Running 1 16h
kube-flannel-ds-amd64-tnbmc 1/1 Running 1 16h
kube-proxy-259l8 1/1 Running 0 16h
kube-proxy-qcnpt 1/1 Running 0 16h
kube-proxy-rp7qx 1/1 Running 3 19h
kube-scheduler-k8s-master 1/1 Running 11 19h
Solutions
Check the log file of coredns. The content is as follows
[root@k8s-master a1zMC2]# kubectl logs -f coredns-bccdc95cf-9wd9n -n kube-system
E0512 01:59:03.825489 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0512 01:59:03.825489 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-bccdc95cf-9wd9n.unknownuser.log.ERROR.20210512-015903.1: no such file or directory
再通过kubectl describe pod coredns-bccdc95cf-9wd9n -n kube-system
命令查看详情
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16h (x697 over 17h) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Warning Unhealthy 15h (x5 over 15h) kubelet, k8s-master Readiness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 15h (x5 over 15h) kubelet, k8s-master Liveness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I feel that there should be a problem connecting with the host, so I enter cat/etc/resolv. Conf
to view the configuration file. It is found that the nameserver column is not the address of the host master.
With a try attitude, modify it to the IP address of the master node, and then restart docker and kubenet
[root@k8s-master a1zMC2]# systemctl stop kubelet
[root@k8s-master a1zMC2]# systemctl stop docker
[root@k8s-master a1zMC2]# iptables --flush
[root@k8s-master a1zMC2]# iptables -tnat --flush
[root@k8s-master a1zMC2]# systemctl start kubelet
[root@k8s-master a1zMC2]# systemctl start docker
Check the status and find that all pods can work normally!
[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-9wd9n 1/1 Running 21 20h
coredns-bccdc95cf-qsf9f 1/1 Running 21 20h
etcd-k8s-master 1/1 Running 4 19h
kube-apiserver-k8s-master 1/1 Running 4 19h
kube-controller-manager-k8s-master 1/1 Running 12 19h
kube-flannel-ds-amd64-sgqsm 1/1 Running 1 17h
kube-flannel-ds-amd64-swqhf 1/1 Running 1 17h
kube-flannel-ds-amd64-tnbmc 1/1 Running 2 17h
kube-proxy-259l8 1/1 Running 0 17h
kube-proxy-qcnpt 1/1 Running 0 17h
kube-proxy-rp7qx 1/1 Running 4 20h
kube-scheduler-k8s-master 1/1 Running 12 19h
Because I haven’t learned the content of cloud computing, there are some mistakes in the blog. Please correct them in the comments area.