Tag Archives: cubernetes

ERROR: Rancher must be ran with the –privileged flag when running outside of Kubernetes

According to the document on the official website of rancher 2.4, only one Linux host is required. Remember that a single node rancher server can be quickly deployed. Of course, this can only be used for testing. Deployment is very convenient. Just start dcoker on the host, and then start a container:

sudo docker run -d --restart=unless-stopped -p 80:80 -p 443:443 rancher/rancher

After the test, it is found that the container is constantly restarted after startup, and there is no way to enter the UI, prompting a network error.

CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS                         PORTS               NAMES
bfd5c52a1ce7        rancher/rancher     "entrypoint.sh"     12 hours ago        Restarting (1) 3 seconds ago                       elated_heisenberg

View log:

[root@k8s-node02 ~]# docker logs --tail 3 bfd5c52a1ce7
ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes
ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes
ERROR: Rancher must be ran with the --privileged flag when running outside of Kubernetes

If an error is found, it is said that — privileged is required to increase privileges
turn off the lifting appliance — privileged to try:

docker run -d --restart=unless-stopped --privileged  -p 80:80  -p 443:443 rancher/rancher

Found it

Kubernetes Error: Error in configuration: unable to read client-cert* unable to read client-key*

System environment:

Ubuntu 20.04 LTSDocker 20.10.8Kubernetes 1.22.1Node: node

Execute command:

$ kubectl version

Errors are reported as follows:

Error in configuration: 
* unable to read client-cert /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory
* unable to read client-key /var/lib/kubelet/pki/kubelet-client-current.pem for default-auth due to open /var/lib/kubelet/pki/kubelet-client-current.pem: no such file or directory

Error reporting Description:

The token on the node has expired.

Solution:

$ kubeadm token create --print-join-command

kubeadm join 192.168.50.51:6443 --token rt43m1.b9py8ba6uxbfv7sr --discovery-token-ca-cert-hash sha256:f57a09633cf0e18cd905d41159a790704502410fd841acd63cffc8e493ad3cb2 

Regenerate the token on the Master node.

$ kubeadm join 192.168.50.51:6443 --token rt43m1.b9py8ba6uxbfv7sr --discovery-token-ca-cert-hash sha256:f57a09633cf0e18cd905d41159a790704502410fd841acd63cffc8e493ad3cb2 
$ kubectl version

Re execute on the node node.

Error from server (BadRequest): a container name must be specified for pod

report errors

Previously, I used kubectl logs -f <POD-name> -n <nameSpace> to view the logs of a pod. One day when I used this command to check the status of a pod that was runnning, I got an error.

Error from server (BadRequest): a container name must be specified for pod xxx ,choose one of:[xxx  xxx]


Causes and treatment of error reporting

Reason: originally, a pod used a container. When you use the command to view the pod log, the log of the pod’s container will be output…
but one day, the architect adjusted the structure of the pod and enabled multiple containers in a pod. From then on, you need to specify which container to view the pod when viewing the log. You can use the command – C < container_ name> Specify that the name of the container that can be viewed is listed in the choose one of error message

 kubectl logs -f  <POD-name> -n <nameSpace> -c  <container_name> 


[Solved] K8s cluster build error: error: kubectl get csr No resources found.

K8s cluster setup error: error: kubectl get CSR no resources found

Problem cause and solution test successful

problem

kubectl get csr
No resources found.

reason

because the original SSL certificate is invalid after restart, if it is not deleted, kubelet cannot communicate with the master even after restart 

Solution:

cd /opt/kubernetes/ssl
ls
kubelet-client-2021-04-14-08-41-36.pem  kubelet-client-current.pem  kubelet.crt  kubelet.key
# Delete all certificates 
rm -rf *
# close or open the kubelet
systemctl stop kubelet

master01

kubectl delete clusterrolebinding kubelet-bootstrap
clusterrolebinding.rbac.authorization.k8s.io "kubelet-bootstrap" deleted

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
clusterrolebinding.rbac.authorization.k8s.io/kubelet-bootstrap created

node 

#open kubelet
#node01
bash kubelet.sh 192.168.238.82
#node02
bash kubelet.sh 192.168.238.83

Test successful

master01

kubectl get csr
NAME                                                   AGE   REQUESTOR           CONDITION
node-csr-mJwuqA7DAf4UmB1InN_WEYhFWbQKOqUVXg9Bvc7Intk   4s    kubelet-bootstrap   Pending
node-csr-ydhzi9EG9M_Ozmbvep0ledwhTCanppStZoq7vuooTq8   11s   kubelet-bootstrap   Pending

Done!!!

Error reporting and resolution of kubernetes installation

A, the initialization of the cluster error reporting
1. Error reported.
[WARNING Hostname]: hostname “master1” could not be reached
[WARNING Hostname]: hostname “master1”: lookup master1 on 114.114.114.114:53: no such host,
details:

[root@k8smaster ~]# kubeadm init --config kubeadm.yaml
W1124 09:40:03.139811   68129 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeadm.k8s.io", Version:"v1beta2", Kind:"ClusterConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "scheduler"
W1124 09:40:03.333487   68129 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
        [WARNING Hostname]: hostname "master1" could not be reached
        [WARNING Hostname]: hostname "master1": lookup master1 on 114.114.114.114:53: no such host
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR Port-6443]: Port 6443 is in use
        [ERROR Port-10259]: Port 10259 is in use
        [ERROR Port-10257]: Port 10257 is in use
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR Port-10250]: Port 10250 is in use
        [ERROR Port-2379]: Port 2379 is in use
        [ERROR Port-2380]: Port 2380 is in use
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

Solution:
delete/etc/kubernetes/manifest
Modify kubedm.yaml. Change the name to the host’s hostname
and then perform initialization.

[root@k8smaster ~]# ls /etc/kubernetes/
admin.conf  controller-manager.conf  kubelet.conf  manifests  pki  scheduler.conf
[root@k8smaster ~]# ls
anaconda-ks.cfg  kubeadm.yaml
[root@k8smaster ~]# rm -rf  /etc/kubernetes/manifests
[root@k8smaster ~]# ls /etc/kubernetes/
admin.conf  controller-manager.conf  kubelet.conf  pki  scheduler.conf

2.error message:
WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
details:

[root@k8smaster ~]# kubeadm init --config kubeadm.yaml
W1124 09:47:13.677697   70122 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeadm.k8s.io", Version:"v1beta2", Kind:"ClusterConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "scheduler"
W1124 09:47:13.876821   70122 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR Port-10250]: Port 10250 is in use
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

Solution:
during initialization, add: – ignore preflight errors = all
that is:

kubeadm init --config kubeadm.yaml --ignore-preflight-errors=all

3.error:
[kubelet-check] Initial timeout of 40s passed.
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with –v=5 or higher
details:

[root@k8smaster ~]# kubeadm init --config kubeadm.yaml --ignore-preflight-errors=all
W1124 09:54:07.361765   71406 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeadm.k8s.io", Version:"v1beta2", Kind:"ClusterConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "scheduler"
W1124 09:54:07.480565   71406 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
        [WARNING Port-10250]: Port 10250 is in use
        [WARNING DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 13.510116 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-check] Initial timeout of 40s passed.
error execution phase upload-config/kubelet: Error writing Crisocket information for the control-plane node: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher

Solution:
execute:

swapoff -a && kubeadm reset  && systemctl daemon-reload && systemctl restart kubelet  && iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

After execution, it is OK to initialize again.

[root@k8smaster ~]# kubeadm init --config kubeadm.yaml --ignore-preflight-errors=all                                                                                                  
W1124 10:00:18.648091   74450 strict.go:54] error unmarshaling configuration schema.GroupVersionKind{Group:"kubeadm.k8s.io", Version:"v1beta2", Kind:"ClusterConfiguration"}: error unmarshaling JSON: while decoding JSON: json: unknown field "scheduler"
W1124 10:00:18.760000   74450 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.19.3
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8smaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.68.127]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8smaster localhost] and IPs [192.168.68.127 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8smaster localhost] and IPs [192.168.68.127 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 10.517451 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.19" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8smaster as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8smaster as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.68.127:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash sha256:31020d84f523a2af6fc4fea38e514af8e5e1943a26312f0515e65075da314b29

Initializing the Kubernetes master node ERROR: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0

run kubeadm init –config=kubeadm.yml –upload-certs | tee kubeadm-init.log command error: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0
full error message as below:

kubeadm init --config=kubeadm.yml --upload-certs | tee kubeadm-init.log

[init] Using Kubernetes version: v1.21.2
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING Hostname]: hostname "node" could not be reached
        [WARNING Hostname]: hostname "node": lookup node on 127.0.0.53:53: server misbehaving
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0: output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.0 not found: manifest unknown: manifest unknown
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

Prompt to pull registry.aliyuncs.com/google_ Containers/coredns: v1.8.0 this image failed
use kubedm config images list — config kubedm.yml to query the image to be downloaded

kubeadm config images list --config kubeadm.yml

registry.aliyuncs.com/google_containers/kube-apiserver:v1.21.2
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.21.2
registry.aliyuncs.com/google_containers/kube-scheduler:v1.21.2
registry.aliyuncs.com/google_containers/kube-proxy:v1.21.2
registry.aliyuncs.com/google_containers/pause:3.4.1
registry.aliyuncs.com/google_containers/etcd:3.4.13-0
registry.aliyuncs.com/google_containers/coredns:v1.8.0

Use the docker images command to query images

docker images

registry.aliyuncs.com/google_containers/kube-apiserver            v1.21.2    106ff58d4308   3 weeks ago     126MB
registry.aliyuncs.com/google_containers/kube-controller-manager   v1.21.2    ae24db9aa2cc   3 weeks ago     120MB
registry.aliyuncs.com/google_containers/kube-scheduler            v1.21.2    f917b8c8f55b   3 weeks ago     50.6MB
registry.aliyuncs.com/google_containers/kube-proxy                v1.21.2    a6ebd1c1ad98   3 weeks ago     131MB
registry.aliyuncs.com/google_containers/pause                     3.4.1      0f8457a4c2ec   6 months ago    683kB
registry.aliyuncs.com/google_containers/etcd                      3.4.13-0   0369cf4303ff   10 months ago   253MB

It is found that there is no registry.aliyuncs.com/google in the downloaded image_ Containers/coredns: v1.8.0
use the docker command to pull the image

docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0

Kubernetes needs registry.aliyuncs.com/google_ Containers/coredns: v1.8.0, rename the image with the docker tag command

# rename
docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
# detele all mirror
docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0

Run the initialization command again

kubeadm init --config=kubeadm.yml --upload-certs | tee kubeadm-init.log

Prompt success

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Follow the above prompts to configure kubectl

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

# NOT ROOT 
chown $(id -u):$(id -g) $HOME/.kube/config

Verify success

kubectl get node

# The ability to print out node information indicates success
NAME   STATUS     ROLES                  AGE   VERSION
node   NotReady   control-plane,master   31m   v1.21.2

[Solved] Ubuntu 20.04 LTS Install k8s Error: The connection to the server localhost:8080 was refused

After Ubuntu 20.04 LTS is successfully added to the cluster on the node, the following prompt will appear:

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

Verify the success of adding by kubectl get nodes command. The following error occurs:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

This error has also occurred on the master node. The solution gives the answer after successful installation

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

I use the root account, so I need to execute the following command:

export KUBECONFIG=/etc/kubernetes/admin.conf

After the command is executed, verify it:

root@k8s-master-03:/etc/kubernetes# kubectl get nodes
W0706 10:27:55.181115   22817 loader.go:221] Config not found: /etc/kubernetes/admin.conf
The connection to the server localhost:8080 was refused - did you specify the right host or port?

The key information of the error report is that the configuration is not found

Config not found: /etc/kubernetes/admin.conf

Let’s go to the /etc/kubernetes/ Directory:

root@k8s-master-03:/etc/kubernetes# ls -l
total 12
-rw------- 1 root root 1910 Jul  6 09:52 kubelet.conf
drwxr-xr-x 2 root root 4096 Jul  6 09:41 manifests
drwxr-xr-x 2 root root 4096 Jul  6 09:52 pki

The results show that only kubelet. Conf (master node has admin. CONF), so we need to execute the following command:

export KUBECONFIG=/etc/kubernetes/kubelet.conf

The solution to the crash loop back off error of coredns in k8s deployment

The solution to the crash loop back off error of coredns in k8s deployment

Problem description

Before doing the project, we need to use k8s to build a cluster. I’m a novice Xiaobai, and I’m going to do it step by step according to the online building steps (refer to the link website for the deployment process)
when I check the status of each pod in the cluster, I find that coredns has not been started successfully, and has been in the crashloopback off state, falling into the dead cycle of non-stop error restart

[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME                                 READY   STATUS             RESTARTS   AGE
coredns-bccdc95cf-9wd9n              0/1     CrashLoopBackOff   19         19h
coredns-bccdc95cf-qsf9f              0/1     CrashLoopBackOff   19         19h
etcd-k8s-master                      1/1     Running            3          19h
kube-apiserver-k8s-master            1/1     Running            3          19h
kube-controller-manager-k8s-master   1/1     Running            11         19h
kube-flannel-ds-amd64-sgqsm          1/1     Running            1          16h
kube-flannel-ds-amd64-swqhf          1/1     Running            1          16h
kube-flannel-ds-amd64-tnbmc          1/1     Running            1          16h
kube-proxy-259l8                     1/1     Running            0          16h
kube-proxy-qcnpt                     1/1     Running            0          16h
kube-proxy-rp7qx                     1/1     Running            3          19h
kube-scheduler-k8s-master            1/1     Running            11         19h

Solutions

Check the log file of coredns. The content is as follows

[root@k8s-master a1zMC2]# kubectl logs -f coredns-bccdc95cf-9wd9n -n kube-system
E0512 01:59:03.825489       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
E0512 01:59:03.825489       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:317: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: no route to host
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-bccdc95cf-9wd9n.unknownuser.log.ERROR.20210512-015903.1: no such file or directory

再通过kubectl describe pod coredns-bccdc95cf-9wd9n -n kube-system命令查看详情

Events:
  Type     Reason            Age                  From                 Message
  ----     ------            ----                 ----                 -------
  Warning  FailedScheduling  16h (x697 over 17h)  default-scheduler    0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
  Warning  Unhealthy         15h (x5 over 15h)    kubelet, k8s-master  Readiness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
  Warning  Unhealthy         15h (x5 over 15h)    kubelet, k8s-master  Liveness probe failed: Get http://10.244.0.2:8080/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I feel that there should be a problem connecting with the host, so I enter cat/etc/resolv. Conf to view the configuration file. It is found that the nameserver column is not the address of the host master.

With a try attitude, modify it to the IP address of the master node, and then restart docker and kubenet

[root@k8s-master a1zMC2]# systemctl stop kubelet
[root@k8s-master a1zMC2]# systemctl stop docker
[root@k8s-master a1zMC2]# iptables --flush
[root@k8s-master a1zMC2]# iptables -tnat --flush
[root@k8s-master a1zMC2]# systemctl start kubelet
[root@k8s-master a1zMC2]# systemctl start docker

Check the status and find that all pods can work normally!

[root@k8s-master a1zMC2]# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-bccdc95cf-9wd9n              1/1     Running   21         20h
coredns-bccdc95cf-qsf9f              1/1     Running   21         20h
etcd-k8s-master                      1/1     Running   4          19h
kube-apiserver-k8s-master            1/1     Running   4          19h
kube-controller-manager-k8s-master   1/1     Running   12         19h
kube-flannel-ds-amd64-sgqsm          1/1     Running   1          17h
kube-flannel-ds-amd64-swqhf          1/1     Running   1          17h
kube-flannel-ds-amd64-tnbmc          1/1     Running   2          17h
kube-proxy-259l8                     1/1     Running   0          17h
kube-proxy-qcnpt                     1/1     Running   0          17h
kube-proxy-rp7qx                     1/1     Running   4          20h
kube-scheduler-k8s-master            1/1     Running   12         19h

Because I haven’t learned the content of cloud computing, there are some mistakes in the blog. Please correct them in the comments area.

Common errors in k8s initialization_ [WARNING NumCPU]_ [WARNING SystemVerification]_ WARNING Hostname

Common errors in k8s initialization

[init] Using Kubernetes version: v1.20.5
[preflight] Running pre-flight checks
    1    [WARNING NumCPU]: the number of available CPUs 1 is less than the required 2
    2    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    3    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.5. Latest validated version: 19.03
    4    [WARNING Hostname]: hostname "k8s" could not be reached
    5    [WARNING Hostname]: hostname "k8s": lookup k8s on 192.168.0.1:53: no such host

Solution:
1. Adjust the number of CPU cores to the current number of cores + 1 (self adaptation)
2. Change the settings to make the container runtime and kubelet use SYSTEMd as CGroup driver https://blog.whsir.com/post-5312.html
3. Re install docker, and the latest verification version of docker is 19.03
4. Configure domain name resolution/etc/hosts
5. Refer to 4