The solution to the crash loop back off error of coredns in k8s deployment
Problem description
Before doing the project, we need to use k8s to build a cluster. I’m a novice Xiaobai, and I’m going to do it step by step according to the online building steps (refer to the link website for the deployment process)
when I check the status of each pod in the cluster, I find that coredns has not been started successfully, and has been in the crashloopback off state, falling into the dead cycle of non-stop error restart
[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-9wd9n 0/1 CrashLoopBackOff 19 19h
coredns-bccdc95cf-qsf9f 0/1 CrashLoopBackOff 19 19h
etcd-k8s-master 1/1 Running 3 19h
kube-apiserver-k8s-master 1/1 Running 3 19h
kube-controller-manager-k8s-master 1/1 Running 11 19h
kube-flannel-ds-amd64-sgqsm 1/1 Running 1 16h
kube-flannel-ds-amd64-swqhf 1/1 Running 1 16h
kube-flannel-ds-amd64-tnbmc 1/1 Running 1 16h
kube-proxy-259l8 1/1 Running 0 16h
kube-proxy-qcnpt 1/1 Running 0 16h
kube-proxy-rp7qx 1/1 Running 3 19h
kube-scheduler-k8s-master 1/1 Running 11 19h
Solutions
Check the log file of coredns. The content is as follows
[root@k8s-master a1zMC2]# kubectl logs -f coredns-bccdc95cf-9wd9n -n kube-system
E0512 01:59:03.825489 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0512 01:59:03.825489 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-bccdc95cf-9wd9n.unknownuser.log.ERROR.20210512-015903.1: no such file or directory
再通过kubectl describe pod coredns-bccdc95cf-9wd9n -n kube-system
命令查看详情
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16h (x697 over 17h) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
Warning Unhealthy 15h (x5 over 15h) kubelet, k8s-master Readiness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 15h (x5 over 15h) kubelet, k8s-master Liveness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
I feel that there should be a problem connecting with the host, so I enter cat/etc/resolv. Conf
to view the configuration file. It is found that the nameserver column is not the address of the host master.
With a try attitude, modify it to the IP address of the master node, and then restart docker and kubenet
[root@k8s-master a1zMC2]# systemctl stop kubelet
[root@k8s-master a1zMC2]# systemctl stop docker
[root@k8s-master a1zMC2]# iptables --flush
[root@k8s-master a1zMC2]# iptables -tnat --flush
[root@k8s-master a1zMC2]# systemctl start kubelet
[root@k8s-master a1zMC2]# systemctl start docker
Check the status and find that all pods can work normally!
[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-bccdc95cf-9wd9n 1/1 Running 21 20h
coredns-bccdc95cf-qsf9f 1/1 Running 21 20h
etcd-k8s-master 1/1 Running 4 19h
kube-apiserver-k8s-master 1/1 Running 4 19h
kube-controller-manager-k8s-master 1/1 Running 12 19h
kube-flannel-ds-amd64-sgqsm 1/1 Running 1 17h
kube-flannel-ds-amd64-swqhf 1/1 Running 1 17h
kube-flannel-ds-amd64-tnbmc 1/1 Running 2 17h
kube-proxy-259l8 1/1 Running 0 17h
kube-proxy-qcnpt 1/1 Running 0 17h
kube-proxy-rp7qx 1/1 Running 4 20h
kube-scheduler-k8s-master 1/1 Running 12 19h
Because I haven’t learned the content of cloud computing, there are some mistakes in the blog. Please correct them in the comments area.
Read More:
- Initializing the Kubernetes master node ERROR: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0
- Login Error: token failed, reason: getaddrinfo EAI_AGAIN ks-apiserver (kubesphere is Installed)
- [Solved] Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to
- [Solved] S3fs mount error: s3fs: unable to access MOUNTPOINT…
- summary of configuration and deployment of uwsgi+nginx+flag in centos7 and why internal server error is prompted [official instructions]
- Weblogic Deployment Error: The most likely cause is an error in the network configuration of this machine.
- Ubuntu Startup Error: warning failed to connect to lvmetad,falling back to device scanning
- You might want to save ‘/run/in itramf s/rdsosreport.txt“ to a USB stick or bootaf ter mounting
- [Solved] kubeadm join Cluster Error: error execution phase kubelet-start
- [Solved] Linux Compile Error: error: ‘for’ loop initial declarations are only allowed in C99 mode
- Failed to Initialize Error: error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Port-6443]
- Ubuntu: pdserving deployment output log.txt Error
- [Solved] minio Failed to Upload File Error: The difference between the request time and the server‘s time is too large.
- The solution of insufficient disk space of docker in Ubuntu
- Using JWT of distributed deployment services Error: JWT check failure:
- Solution to IO error encountered in Rsync: skipping file deletion
- Solution to gzip: stdin: invalid compressed data — format violated error in decompressing. Tgz file under Linux
- [Solved] Docker Install Error: [Errno 14] curl#60 – Peer‘s Certificate issuer is not recognized
- Solution of device eth0 does not see to be present, delaying initialization. Error in network card under Linux
- ERROR conda.core.link:_execute(699): An error occurred while installing package ‘‘Rolling back trans