The solution to the crash loop back off error of coredns in k8s deployment

Problem description

Before doing the project, we need to use k8s to build a cluster. I’m a novice Xiaobai, and I’m going to do it step by step according to the online building steps (refer to the link website for the deployment process)
when I check the status of each pod in the cluster, I find that coredns has not been started successfully, and has been in the crashloopback off state, falling into the dead cycle of non-stop error restart

[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME                                 READY   STATUS             RESTARTS   AGE
coredns-bccdc95cf-9wd9n              0/1     CrashLoopBackOff   19         19h
coredns-bccdc95cf-qsf9f              0/1     CrashLoopBackOff   19         19h
etcd-k8s-master                      1/1     Running            3          19h
kube-apiserver-k8s-master            1/1     Running            3          19h
kube-controller-manager-k8s-master   1/1     Running            11         19h
kube-flannel-ds-amd64-sgqsm          1/1     Running            1          16h
kube-flannel-ds-amd64-swqhf          1/1     Running            1          16h
kube-flannel-ds-amd64-tnbmc          1/1     Running            1          16h
kube-proxy-259l8                     1/1     Running            0          16h
kube-proxy-qcnpt                     1/1     Running            0          16h
kube-proxy-rp7qx                     1/1     Running            3          19h
kube-scheduler-k8s-master            1/1     Running            11         19h

Solutions

Check the log file of coredns. The content is as follows

[root@k8s-master a1zMC2]# kubectl logs -f coredns-bccdc95cf-9wd9n -n kube-system
E0512 01:59:03.825489       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0512 01:59:03.825489       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-bccdc95cf-9wd9n.unknownuser.log.ERROR.20210512-015903.1: no such file or directory

再通过kubectl describe pod coredns-bccdc95cf-9wd9n -n kube-system命令查看详情

Events:
  Type     Reason            Age                  From                 Message
  ----     ------            ----                 ----                 -------
  Warning  FailedScheduling  16h (x697 over 17h)  default-scheduler    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Warning  Unhealthy         15h (x5 over 15h)    kubelet, k8s-master  Readiness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy         15h (x5 over 15h)    kubelet, k8s-master  Liveness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I feel that there should be a problem connecting with the host, so I enter cat/etc/resolv. Conf to view the configuration file. It is found that the nameserver column is not the address of the host master.

With a try attitude, modify it to the IP address of the master node, and then restart docker and kubenet

[root@k8s-master a1zMC2]# systemctl stop kubelet
[root@k8s-master a1zMC2]# systemctl stop docker
[root@k8s-master a1zMC2]# iptables --flush
[root@k8s-master a1zMC2]# iptables -tnat --flush
[root@k8s-master a1zMC2]# systemctl start kubelet
[root@k8s-master a1zMC2]# systemctl start docker

Check the status and find that all pods can work normally!

[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-bccdc95cf-9wd9n              1/1     Running   21         20h
coredns-bccdc95cf-qsf9f              1/1     Running   21         20h
etcd-k8s-master                      1/1     Running   4          19h
kube-apiserver-k8s-master            1/1     Running   4          19h
kube-controller-manager-k8s-master   1/1     Running   12         19h
kube-flannel-ds-amd64-sgqsm          1/1     Running   1          17h
kube-flannel-ds-amd64-swqhf          1/1     Running   1          17h
kube-flannel-ds-amd64-tnbmc          1/1     Running   2          17h
kube-proxy-259l8                     1/1     Running   0          17h
kube-proxy-qcnpt                     1/1     Running   0          17h
kube-proxy-rp7qx                     1/1     Running   4          20h
kube-scheduler-k8s-master            1/1     Running   12         19h

Because I haven’t learned the content of cloud computing, there are some mistakes in the blog. Please correct them in the comments area.

ProgrammerAH

Programmer Guide, Tips and Tutorial

The solution to the crash loop back off error of coredns in k8s deployment