Tag Archives: cubernetes

Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to

Installed calico using the tigera-operator method and reported an error after startup, all calico related pods show CrashLoopBackoff.

kubectl -n calico-system describe pod calico-node-2t8w6 and found the following error.

Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/ run/calico/ bird.ctl: connect: no such file or directory.

Cause of the problem:

We are experiencing this issue during a Kubernetes Cluster deployment. Since Calico automatically detects IP addresses by default using the first-found method and gets the wrong address, we need to specify the detection method manually.

1. Remove all the claico

kubectl -n tigera-operator get deployments.apps -o yaml > a.yaml
kubectl -n calico-system get daemonsets.apps calico-node -o yaml > b.yaml
kubectl -n calico-system get deployments.apps calico-kube-controllers -o yaml > c.yaml
kubectl -n calico-system get deployments.apps calico-typha -o yaml > d.yaml
kubectl -n calico-apiserver get deployments.apps calico-apiserver -o yaml > e.yaml
kubectl delete -f a.yaml
kubectl delete -f b.yaml
kubectl delete -f c.yaml
kubectl delete -f d.yaml
kubectl delete -f e.yaml
2. Remove custom-resources.yaml
kubectl delete -f tigera-operator.yaml
kubectl delete -f custom-resources.yaml

3. Remove vxlan.calico
ip link delete vxlan.calico

4. Modify custom-resources.yaml file and add nodeAddressAutodetectionV4:
# This section includes base Calico installation configuration.
# For more information, see: https://projectcalico.docs.tigera.io/v3.23/reference/installation/api#operator.tigera.io/v1.Installation
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
# Configures Calico networking.
calicoNetwork:
# Note: The ipPools section cannot be modified post-install.
#bgp: Enabled
#hostPorts: Enabled
ipPools:
– blockSize: 26
cidr: 10.244.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
#linuxDataplane: Iptables
#multiInterfaceMode: None
nodeAddressAutodetectionV4:
interface: ens.*

# This section configures the Calico API server.
# For more information, see: https://projectcalico.docs.tigera.io/v3.23/reference/installation/api#operator.tigera.io/v1.APIServer
apiVersion: operator.tigera.io/v1
kind: APIServer
metadata:
name: default
spec: {}
5. Re-create
kubectl create -f tigera-operator.yaml
kubectl create -f custom-resources.yaml
check
kubectl -n calico-system get daemonsets.apps calico-node  -o yaml|grep -A2 IP_AUTODETECTION_METHOD

[Solved] failed to set bridge addr: “cni0“ already has an IP address different from xxxx

failed to set bridge addr: “cni0“ already has an IP address different from xxxx

Recently, when debugging Kubernetes to add or delete a node, and then deploying Pod on this node, a network card address error exception occurred. The troubleshooting solution for this exception is as follows:

Error:

(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox “745720ffb20646054a167560299b19bb9ae046fe6c677b5d26312b89a26554e1”: failed to set bridge addr: “cni0” already has an IP address different from 172.20.2.1/24

 

Solution:

  1. Delete the node without restarting the node server, restart the node server (in this case, it is usually caused by the server cache, restart the server on it)
  2. After restarting the server or not, delete the wrong NIC on the node and wait for the system to rebuild automatically, the operation process is as follows.
sudo ifconfig cni0 down    
sudo ip link delete cni0

[ERROR Swap]: running with swap on is not supported. Please disable swap

Failed to install kubeadm, report the following error as below:

[root@k8s1 yum.repos.d]# kubeadm init   –apiserver-advertise-address=192.168.12.10   –image-repository registry.aliyuncs.com/google_containers   –kubernetes-version v1.18.0   –service-cidr=10.96.0.0/12   –pod-network-cidr=10.244.0.0/16
W0928 15:17:23.161858    1999 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.0
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected “cgroupfs” as the Docker cgroup driver. The recommended driver is “systemd”. Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Swap]: running with swap on is not supported. Please disable swap
[preflight] If you know what you are doing, you can make a check non-fatal with `–ignore-preflight-errors=…`
To see the stack trace of this error execute with –v=5 or higher

How to Solve:

Need to turn off swap in linux

# Turn off swap, run both commands to solve the problem
swapoff -a # temporary
sed -ri ‘s/. *swap.*/#&/’ /etc/fstab # permanent

 

[Solved] kubectl top pod error: error: Metrics API not available

k8s version: v1.24.4

kubectl top pod error: error: Metrics API not available
Error: Readiness probe failed: HTTP probe failed with statuscode: 500
vim custom-resources.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
        - --kubelet-use-node-status-port
        - --metric-resolution=15s
        - --kubelet-insecure-tls
        image: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server:v0.6.1
          #image: k8s.gcr.io/metrics-server/metrics-server:v0.6.1
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /livez
            port: https
            scheme: HTTPS
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /readyz
            port: https
            scheme: HTTPS
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100

#execute
kubectl apply -f custom-resources.yaml
#view pod
kubectl get pod -A |grep me

[Solved] k8s Error: Back-off restarting failed container

1. Cause

When I run Ubuntu through k8s, I execute the following script

#!/bin/bash
service ssh start
echo root:$1|chpasswd

After the container is started, there is no resident foreground process inside the container, which causes the container to exit after the container is started successfully, thus continuing to restart.

2. Solution

At startup, perform a task that will never be completed

command: ["/bin/bash", "-ce", "tail -f /dev/null"]

Add the following to the script above:

#!/bin/bash
service ssh start
echo root:$1|chpasswd
tail -f /dev/null

Successfully solved, this script can be executed successfully, and the container can be started successfully

How to Solve K8S Error getting node

During the installation or operation of k8s cluster, you may encounter problems of "error getting node", such as:

"Error getting node" err="node \"master\" not found"
dial tcp 10.8.126.46:6443: connect: connection refused"
"Error getting node" err="node \"master\" not found"
"Error getting node" err="node \"master\" not found"

The way to troubleshoot such problems is to execute the following commands to check the specific error causes:

journalctl -xeu kubelet

Find the initial error and deal with it according to different errors
according to the problems I have encountered, there are mainly the following possibilities:

  1. No swap memory disabled
  2. There is a problem with hostname setting or hosts setting (other bloggers listed reasons)
  3. The container and k8s version is not compatible (other bloggers listed reasons)

[Solved] Canal Error: Could not find first log file name in binary log index file

Check /home/admin/canal-server/logs/example/example.log and find the following error:

2022-07-20 00:00:08.473 [destination = example , address = mall-mysql/192.168.38.131:3306 , EventParser] ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:e
xample[java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = Could not find first log file name in binary log index file                             
        at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch(DirectLogFetcher.java:102)                                                                    
        at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:238)                                                                              
        at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$1.run(AbstractEventParser.java:262)                                                                           
        at java.lang.Thread.run(Thread.java:748) 

reason:

The binlog file set in the configuration file was not found

Solution:

Because the configuration file of instance.properties is packaged into the docker image, so it can only be modified in the instance

First check the binlog log file name and position in the database

Query in the mall-mysql database of this example:

mysql> show master status;

Output file: File: mysql-binlog.000233, Position: 652645

Enter the instance:

kubectl exec -ti mall-canal-84f6f7d7cc-xbghn bash -n nsName
xxx> vi /home/admin/canal-server/conf/example/instance.properties

Modify the position Info section:

canal.instance.master.address=mall-mysql:3306                                                                                                                                      
canal.instance.master.journal.name=mysql-binlog.000233                                                                                                                             
canal.instance.master.position=652645                                                                                                                                              
canal.instance.master.timestamp=                                                                                                                                                   
canal.instance.master.gtid=

Restart service:

xxx> cd /home/admin/canal-server
xxx> ./restart.sh

Check the log after restart and solve this error.

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[error systemverification]: failed to parse kernel config: unable to load kernel module: “configs”.

When installing kubernetes cluster, the above error is reported.

 

Solution:

Method 1: ignore the error

Add the –ignore-preflight-errors=SystemVerification option to ignore the error. It is not possible to tell if other problems will occur subsequently with this option.

Method 2: Upgrade kernel version

I installed the kubernetes cluster using kernel version 4.19.12, and the problem did not occur after upgrading the kernel to 5.13.7. I am not sure if it is a kernel version problem.

Method 3:

Manually compile the config kernel module

 

Failed to Initialize Error: error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Port-6443]

[root@k8s-master01 ~]# kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs | tee kubeadm-init.log
Flag --experimental-upload-certs has been deprecated, use --upload-certs instead
[init] Using Kubernetes version: v1.15.1
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.11. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR Port-6443]: Port 6443 is in use
    [ERROR Port-10251]: Port 10251 is in use
    [ERROR Port-10252]: Port 10252 is in use
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR Port-2379]: Port 2379 is in use
    [ERROR Port-2380]: Port 2380 is in use
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Reason:
Restart kubeadm after modifying the kubeadm-config.yaml file, otherwise the port from the previous startup is occupied.
Solution:
Result test:
The k8s cluster was initialized successfully.
[root@master1 ~]# kubeadm init –config kubeadm-config.yaml –ignore-preflight-errors=SystemVerific

 

[Solved] Kubernetes Error: failed to list *core.Secret: unable to transform key

While installing a Kubernetes local cluster, I happened to encounter the following problem:

E0514 07:30:58.627632 1 cacher.go:424] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input; reinitializing…
W0514 07:30:59.631509 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input
E0514 07:30:59.631563 1 cacher.go:424] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input; reinitializing…
W0514 07:31:00.633540 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input
E0514 07:31:00.633575 1 cacher.go:424] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input; reinitializing…

 

Reason:

We know that after running the cluster master, we need to create the TLS Bootstrap Secret to provide an automatic visa using.

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Secret
metadata:
  name: bootstrap-token-${TOKEN_ID}
  namespace: kube-system
type: bootstrap.kubernetes.io/token
stringData:
  token-id: "${TOKEN_ID}"
  token-secret: "${TOKEN_SECRET}"
  usage-bootstrap-authentication: "true"
  usage-bootstrap-signing: "true"
  auth-extra-groups: system:bootstrappers:default-node-token
EOF

secret "bootstrap-token-65a3a9" created

where BOOTSTRAP_TOKEN=T O K E N I D . {TOKEN_ID}.TOKEN
I

D.{TOKEN_SECRET} can be found in bootstrap-kubelet.conf.

One of the reasons for the problem shown in the title is that the command may have been run multiple times and multiple secrets exist, e.g. the node side was found to be not working properly and a bootstrap-kubelet.conf was regenerated for it, etc.

Then when installing the kubernetes cluster manually, we will find that the online information is backward after all, so we will use the kubeadm post-installation information for comparison and verification, and then I accidentally added the following codes:

spec:
hostNetwork: true
priorityClassName: system-cluster-critical
securityContext:
seccompProfile:
type: RuntimeDefault

spec.securityContext.seccompProfile.type=RuntimeDefault, this setting will automatically generate a self-signed secret when the cluster is running, which will lead to a contradiction with the manual generation and the problem in the title.

 

Solution:

1) First clear the cluster cache, delete all files under /var/lib/etcd/ and /var/lib/kubelet/, and keep the config.xml file in the latter.
2) Delete the spec.securityContext.type=”seccompProfile” in /etc/kubernetes/manifests under kube-apiserver.yml, kube-controller-manager.yml and kube-scheduler.yml. seccompProfile.type=RuntimeDefault.
3) Re-run the kubelet: systemctl start kubelet and you are done.

How to Solve kubelet starts error (k8s Cluster Restarted)

How to Solve kubelet starts error after k8s Cluster is Restarted

After the k8s cluster restarts, kubelet starts to solve the error

1 k8s version 1.23.0, docker CE version 20.10.14

2. An error is reported for the problem, and an error is reported for starting kubelet. The contents are as follows:

May 16 09:47:13 k8s-master kubelet: E0516 09:47:13.512956    7403 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
May 16 09:47:13 k8s-master systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE
May 16 09:47:13 k8s-master systemd: Unit kubelet.service entered failed state.
May 16 09:47:13 k8s-master systemd: kubelet.service failed

3 problem analysis: according to the error report, the reason should be that kubelet’s cgroups are inconsistent with docker

4. Solve the problem and modify the docker configuration

cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF

5. Restart docker to solve the problem

[root@k8s-master ~]# systemctl restart docker
[root@k8s-master ~]# systemctl restart kubelet
[root@k8s-master ~]# 
[root@k8s-master ~]# systemctl status  kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2022-05-16 09:48:06 CST; 3s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 8226 (kubelet)
    Tasks: 23
   Memory: 56.9M
   CGroup: /system.slice/kubelet.service
           ├─8226 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config...
           └─8745 /opt/cni/bin/calico

[Solved] pod Error: back off restarting failed container

pod Error: back off restarting failed container

 

Solution:

1. Find the corrosponding deployment
2. Add command: [ “/bin/bash”, “-ce”, “tail -f /dev/null” ]
as following:

kind: Deployment
apiVersion: apps/v1beta2
metadata:
  labels:
    app: jenkins-master
  name: jenkins-master-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jenkins-master
  template:
    metadata:
      labels:
        app: jenkins-master
    spec:
      containers:
      - name: jenkins-master
        image: drud/jenkins-master:v0.29.0
        imagePullPolicy: IfNotPresent
        command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
        volumeMounts:
        - mountPath: /var/jenkins_home/
          name: masterjkshome
        ports:
        - containerPort: 8080
      volumes:
      - name: masterjkshome
        persistentVolumeClaim:
          claimName: pvcjkshome