[kubernetes] the pod instance of calico node always reports an error and restarts

[background]

Today, we tested the node node capacity expansion of k8s cluster. The whole process of capacity expansion was very smooth. However, it was later found that on the newly expanded node (k8s-node04), there has always been an error reported and restarted pod instance of calico node.

[phenomenon]

From the running status query results of the following pod instances, it can be found that a pod instance (calico-node-xl9bc) is constantly restarting.

[root@k8s-master01 ~]# kubectl get pods -A| grep calico
kube-system            calico-kube-controllers-78d6f96c7b-tv2g6               1/1     Running     0          75m
kube-system            calico-node-6dk7g                                      1/1     Running     0          75m
kube-system            calico-node-dlf26                                      1/1     Running     0          75m
kube-system            calico-node-s5phd                                      1/1     Running     0          75m
kube-system            calico-node-xl9bc                                      0/1     Running     30          3m28s

[troubleshooting]

Query the log of pod

[root@k8s-master01 ~]# kubectl logs calico-node-xl9bc -n kube-system -f

2021-09-04 12:32:45.011 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:46.025 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:47.038 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:48.050 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:49.061 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:50.072 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:51.079 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:52.093 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:53.104 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:54.114 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:55.127 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:56.138 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:57.148 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:58.162 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:32:59.176 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:33:00.186 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:33:01.199 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:33:02.211 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:33:03.225 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host
2021-09-04 12:33:04.238 [ERROR][69] felix/health.go 246: Health endpoint failed, trying to restart it... error=listen tcp: lookup localhost on 114.114.114.114:53: no such host

This error report has been searched on the Internet for a long time, and no targeted solution has been found.

As like as two peas of IPv4 and IPv6, the /etc/hosts file was found to be missing from the two files of the node file. It turned out to be a little bit wrong with the /etc/hosts file of my local node node. I didn’t know that when I installed the virtual machine last night, I did not know that the last time I installed it on my own. What strange operation did you do.

### /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6

After adding these two lines of configuration in the/etc/hosts file of k8s-node04, restart the network and find that crashloopbackoff occurs in the pod instance.

[root@k8s-master01 ~]# kubectl get pods -A| grep calico
kube-system            calico-kube-controllers-78d6f96c7b-tv2g6               1/1     Running            0          80m
kube-system            calico-node-6dk7g                                      1/1     Running            0          80m
kube-system            calico-node-dlf26                                      1/1     Running            0          80m
kube-system            calico-node-s5phd                                      1/1     Running            0          80m
kube-system            calico-node-xl9bc                                      0/1     CrashLoopBackOff   7          8m24s

After deleting this pod instance, it is found that the running state of the recreated pod instance finally returns to normal.

[root@k8s-master01 ~]# kubectl delete pod calico-node-xl9bc -n kube-system
pod "calico-node-xl9bc" deleted
[root@k8s-master01 ~]# kubectl get pods -A| grep calico
kube-system            calico-kube-controllers-78d6f96c7b-tv2g6               1/1     Running     0          81m
kube-system            calico-node-6dk7g                                      1/1     Running     0          81m
kube-system            calico-node-dlf26                                      1/1     Running     0          81m
kube-system            calico-node-mz58r                                      0/1     Running     0          5s
kube-system            calico-node-s5phd                                      1/1     Running     0          81m
[root@k8s-master01 ~]# kubectl get pods -A| grep calico
kube-system            calico-kube-controllers-78d6f96c7b-tv2g6               1/1     Running     0          81m
kube-system            calico-node-6dk7g                                      1/1     Running     0          81m
kube-system            calico-node-dlf26                                      1/1     Running     0          81m
kube-system            calico-node-mz58r                                      0/1     Running     0          7s
kube-system            calico-node-s5phd                                      1/1     Running     0          81m
[root@k8s-master01 ~]# kubectl get pods -A| grep calico
kube-system            calico-kube-controllers-78d6f96c7b-tv2g6               1/1     Running     0          81m
kube-system            calico-node-6dk7g                                      1/1     Running     0          81m
kube-system            calico-node-dlf26                                      1/1     Running     0          81m
kube-system            calico-node-mz58r                                      0/1     Running     0          8s
kube-system            calico-node-s5phd                                      1/1     Running     0          81m
[root@k8s-master01 ~]# kubectl get pods -A| grep calico
kube-system            calico-kube-controllers-78d6f96c7b-tv2g6               1/1     Running     0          81m
kube-system            calico-node-6dk7g                                      1/1     Running     0          81m
kube-system            calico-node-dlf26                                      1/1     Running     0          81m
kube-system            calico-node-mz58r                                      1/1     Running     0          11s
kube-system            calico-node-s5phd                                      1/1     Running     0          81m

Read More: