Tag Archives: cubernetes

How to Solve K8S Error getting node

During the installation or operation of k8s cluster, you may encounter problems of "error getting node", such as:

"Error getting node" err="node \"master\" not found"
dial tcp 10.8.126.46:6443: connect: connection refused"
"Error getting node" err="node \"master\" not found"
"Error getting node" err="node \"master\" not found"

The way to troubleshoot such problems is to execute the following commands to check the specific error causes:

journalctl -xeu kubelet

Find the initial error and deal with it according to different errors
according to the problems I have encountered, there are mainly the following possibilities:

  1. No swap memory disabled
  2. There is a problem with hostname setting or hosts setting (other bloggers listed reasons)
  3. The container and k8s version is not compatible (other bloggers listed reasons)

[Solved] Canal Error: Could not find first log file name in binary log index file

Check /home/admin/canal-server/logs/example/example.log and find the following error:

2022-07-20 00:00:08.473 [destination = example , address = mall-mysql/192.168.38.131:3306 , EventParser] ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:e
xample[java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = Could not find first log file name in binary log index file                             
        at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch(DirectLogFetcher.java:102)                                                                    
        at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:238)                                                                              
        at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$1.run(AbstractEventParser.java:262)                                                                           
        at java.lang.Thread.run(Thread.java:748) 

reason:

The binlog file set in the configuration file was not found

Solution:

Because the configuration file of instance.properties is packaged into the docker image, so it can only be modified in the instance

First check the binlog log file name and position in the database

Query in the mall-mysql database of this example:

mysql> show master status;

Output file: File: mysql-binlog.000233, Position: 652645

Enter the instance:

kubectl exec -ti mall-canal-84f6f7d7cc-xbghn bash -n nsName
xxx> vi /home/admin/canal-server/conf/example/instance.properties

Modify the position Info section:

canal.instance.master.address=mall-mysql:3306                                                                                                                                      
canal.instance.master.journal.name=mysql-binlog.000233                                                                                                                             
canal.instance.master.position=652645                                                                                                                                              
canal.instance.master.timestamp=                                                                                                                                                   
canal.instance.master.gtid=

Restart service:

xxx> cd /home/admin/canal-server
xxx> ./restart.sh

Check the log after restart and solve this error.

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[error systemverification]: failed to parse kernel config: unable to load kernel module: “configs”.

When installing kubernetes cluster, the above error is reported.

 

Solution:

Method 1: ignore the error

Add the –ignore-preflight-errors=SystemVerification option to ignore the error. It is not possible to tell if other problems will occur subsequently with this option.

Method 2: Upgrade kernel version

I installed the kubernetes cluster using kernel version 4.19.12, and the problem did not occur after upgrading the kernel to 5.13.7. I am not sure if it is a kernel version problem.

Method 3:

Manually compile the config kernel module

 

Failed to Initialize Error: error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Port-6443]

[[email protected] ~]# kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs | tee kubeadm-init.log
Flag --experimental-upload-certs has been deprecated, use --upload-certs instead
[init] Using Kubernetes version: v1.15.1
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.11. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR Port-6443]: Port 6443 is in use
    [ERROR Port-10251]: Port 10251 is in use
    [ERROR Port-10252]: Port 10252 is in use
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR Port-2379]: Port 2379 is in use
    [ERROR Port-2380]: Port 2380 is in use
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Reason:
Restart kubeadm after modifying the kubeadm-config.yaml file, otherwise the port from the previous startup is occupied.
Solution:
Result test:
The k8s cluster was initialized successfully.
[[email protected] ~]# kubeadm init –config kubeadm-config.yaml –ignore-preflight-errors=SystemVerific

 

[Solved] Kubernetes Error: failed to list *core.Secret: unable to transform key

While installing a Kubernetes local cluster, I happened to encounter the following problem:

E0514 07:30:58.627632 1 cacher.go:424] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input; reinitializing…
W0514 07:30:59.631509 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input
E0514 07:30:59.631563 1 cacher.go:424] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input; reinitializing…
W0514 07:31:00.633540 1 reflector.go:324] storage/cacher.go:/secrets: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input
E0514 07:31:00.633575 1 cacher.go:424] cacher (*core.Secret): unexpected ListAndWatch error: failed to list *core.Secret: unable to transform key “/registry/secrets/default/default-token-nk77g”: invalid padding on input; reinitializing…

 

Reason:

We know that after running the cluster master, we need to create the TLS Bootstrap Secret to provide an automatic visa using.

cat <<EOF | kubectl create -f -
apiVersion: v1
kind: Secret
metadata:
  name: bootstrap-token-${TOKEN_ID}
  namespace: kube-system
type: bootstrap.kubernetes.io/token
stringData:
  token-id: "${TOKEN_ID}"
  token-secret: "${TOKEN_SECRET}"
  usage-bootstrap-authentication: "true"
  usage-bootstrap-signing: "true"
  auth-extra-groups: system:bootstrappers:default-node-token
EOF

secret "bootstrap-token-65a3a9" created

where BOOTSTRAP_TOKEN=T O K E N I D . {TOKEN_ID}.TOKEN
I

D.{TOKEN_SECRET} can be found in bootstrap-kubelet.conf.

One of the reasons for the problem shown in the title is that the command may have been run multiple times and multiple secrets exist, e.g. the node side was found to be not working properly and a bootstrap-kubelet.conf was regenerated for it, etc.

Then when installing the kubernetes cluster manually, we will find that the online information is backward after all, so we will use the kubeadm post-installation information for comparison and verification, and then I accidentally added the following codes:

spec:
hostNetwork: true
priorityClassName: system-cluster-critical
securityContext:
seccompProfile:
type: RuntimeDefault

spec.securityContext.seccompProfile.type=RuntimeDefault, this setting will automatically generate a self-signed secret when the cluster is running, which will lead to a contradiction with the manual generation and the problem in the title.

 

Solution:

1) First clear the cluster cache, delete all files under /var/lib/etcd/ and /var/lib/kubelet/, and keep the config.xml file in the latter.
2) Delete the spec.securityContext.type=”seccompProfile” in /etc/kubernetes/manifests under kube-apiserver.yml, kube-controller-manager.yml and kube-scheduler.yml. seccompProfile.type=RuntimeDefault.
3) Re-run the kubelet: systemctl start kubelet and you are done.

How to Solve kubelet starts error (k8s Cluster Restarted)

How to Solve kubelet starts error after k8s Cluster is Restarted

After the k8s cluster restarts, kubelet starts to solve the error

1 k8s version 1.23.0, docker CE version 20.10.14

2. An error is reported for the problem, and an error is reported for starting kubelet. The contents are as follows:

May 16 09:47:13 k8s-master kubelet: E0516 09:47:13.512956    7403 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
May 16 09:47:13 k8s-master systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE
May 16 09:47:13 k8s-master systemd: Unit kubelet.service entered failed state.
May 16 09:47:13 k8s-master systemd: kubelet.service failed

3 problem analysis: according to the error report, the reason should be that kubelet’s cgroups are inconsistent with docker

4. Solve the problem and modify the docker configuration

cat > /etc/docker/daemon.json <<EOF
{"exec-opts": ["native.cgroupdriver=systemd"]}
EOF

5. Restart docker to solve the problem

[[email protected] ~]# systemctl restart docker
[[email protected] ~]# systemctl restart kubelet
[[email protected] ~]# 
[[email protected] ~]# systemctl status  kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2022-05-16 09:48:06 CST; 3s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 8226 (kubelet)
    Tasks: 23
   Memory: 56.9M
   CGroup: /system.slice/kubelet.service
           ├─8226 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config...
           └─8745 /opt/cni/bin/calico

[Solved] pod Error: back off restarting failed container

pod Error: back off restarting failed container

 

Solution:

1. Find the corrosponding deployment
2. Add command: [ “/bin/bash”, “-ce”, “tail -f /dev/null” ]
as following:

kind: Deployment
apiVersion: apps/v1beta2
metadata:
  labels:
    app: jenkins-master
  name: jenkins-master-deploy
spec:
  replicas: 1
  selector:
    matchLabels:
      app: jenkins-master
  template:
    metadata:
      labels:
        app: jenkins-master
    spec:
      containers:
      - name: jenkins-master
        image: drud/jenkins-master:v0.29.0
        imagePullPolicy: IfNotPresent
        command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
        volumeMounts:
        - mountPath: /var/jenkins_home/
          name: masterjkshome
        ports:
        - containerPort: 8080
      volumes:
      - name: masterjkshome
        persistentVolumeClaim:
          claimName: pvcjkshome

[Solved] Error from server (InternalError): error when creating “ingress.yaml”: Internal error occurred: fail

When using the ingress exposure service, kubectl apply -f ingress.yaml reports the following error.
Reported error:

Error from server (InternalError): error when creating “ingress.yaml”: Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: failed to call webhook: Post “https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s”: x509: certificate has expired or is not yet valid: current time 2022-03-26T14:45:34Z is before 2022-03-26T20:16:32Z

 

Solution:
Check kubectl apply -f ingress.yaml

kubectl get validatingwebhookconfigurations

Delete ingress-nginx-admission

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission

Then execute

kubectl apply -f ingress.yaml 

Kubernetes create secret Error: Error from server (InternalError): Internal error occurred…

Creating secret Error:
# kubectl create secret generic thanos-objectstorage –from-file=objstore.yaml -n monitoring
Error Messages:

Error from server (InternalError): Internal error occurred: failed calling webhook “rancher.cattle.io”: Post https://rancher-webhook.cattle-system.svc:443/v1/webhook/mutation?timeout=10s: service “rancher-webhook” not found

 

According to the error report, it may be a problem with rbac, which cannot create

# kubectl get mutatingwebhookconfigurations
NAME                             WEBHOOKS   AGE
mutating-webhook-configuration   7          156d
rancher.cattle.io                2          156d

# kubectl get validatingwebhookconfigurations
NAME                               WEBHOOKS   AGE
rancher.cattle.io                  2          156d
validating-webhook-configuration   7          156d

There are two admission controllers found in the view, both of which are leftover from the previous installation of components

Just Delete it

# kubectl delete mutatingwebhookconfigurations mutating-webhook-configuration
mutatingwebhookconfiguration.admissionregistration.k8s.io "mutating-webhook-configuration" deleted

# kubectl delete mutatingwebhookconfigurations rancher.cattle.io
mutatingwebhookconfiguration.admissionregistration.k8s.io "rancher.cattle.io" deleted

# kubectl delete validatingwebhookconfigurations rancher.cattle.io
validatingwebhookconfiguration.admissionregistration.k8s.io "rancher.cattle.io" deleted

# kubectl delete validatingwebhookconfigurations validating-webhook-configuration
validatingwebhookconfiguration.admissionregistration.k8s.io "validating-webhook-configuration" deleted

[Solved] kubelet Startup Error: cannot find network namespace for the terminated container

1. Error reporting:

Use the journalctl – xefu kubelet command to view the kubelet log. The following errors are found:

cannot find network namespace for the terminated container

2. Solution:

# docker system prune

# systemctl restart kubelet

Instructions for using docker system:

# docker system -h

Flag shorthand -h has been deprecated, please use –help

Usage: docker system COMMAND

Manage Docker

Commands:

df             Show docker disk usage

Check the usage of docker space.

events     Get real time events from the server

View live events.

info         Display system-wide information

View system information.

prune     Remove unused data

Docker cleans the stopped container, and there is no network, image and cache used by the container.

[Solved] GRPC-Server Error: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String; CLjava

Grpc server reports an error com google.common.base.Preconditions.checkArgument (ZLjava/lang/String;CLjava/lang/Object);

Problem background solution summary Lyric: I really want to take another bite, ผั๥๥๥ผั๥ณ, ผั๥ณ, ผั๥๥ณ This is the first song. It’s over. Have you guessed the title of the song?

Problem background

When working as grpc server, I can’t start it. The error report is printed as follows, but I can’t well see what’s wrong. Since grpc can be used when I test it alone, but as the project becomes more and more complex, more and more POM dependencies are introduced, so I began to find the reason from it

2022-01-25 11:01:39.896 ERROR [id-mapping-AsyncThread-1] o.s.a.i.SimpleAsyncUncaughtExceptionHandler.handleUncaughtException(SimpleAsyncUncaughtExceptionHandler.java:39): Unexpected exception occurred invoking async method: public void grpc.server.GrpcServer.start() throws java.io.IOException
java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;CLjava/lang/Object;)V
	at io.grpc.Metadata$Key.validateName(Metadata.java:629)
	at io.grpc.Metadata$Key.<init>(Metadata.java:637)
	at io.grpc.Metadata$Key.<init>(Metadata.java:567)
	at io.grpc.Metadata$AsciiKey.<init>(Metadata.java:742)
	at io.grpc.Metadata$AsciiKey.<init>(Metadata.java:737)
	at io.grpc.Metadata$Key.of(Metadata.java:593)
	at io.grpc.Metadata$Key.of(Metadata.java:589)
	at io.grpc.internal.GrpcUtil.<clinit>(GrpcUtil.java:86)
	at io.grpc.internal.AbstractServerImplBuilder.<clinit>(AbstractServerImplBuilder.java:60)
	at io.grpc.netty.shaded.io.grpc.netty.NettyServerProvider.builderForPort(NettyServerProvider.java:39)
	at io.grpc.netty.shaded.io.grpc.netty.NettyServerProvider.builderForPort(NettyServerProvider.java:24)
	at io.grpc.ServerBuilder.forPort(ServerBuilder.java:41)
	at server.Server.start(GrpcServer.java:30)
	at grpc.server.GrpcServer$$FastClassBySpringCGLIB$$be87d0e.invoke(<generated>)
	at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:771)
	at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
	at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:749)
	at org.springframework.aop.interceptor.AsyncExecutionInterceptor.lambda$invoke$0(AsyncExecutionInterceptor.java:115)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

 

Solution:

1. Analyze the imported jar package dependency and use file → setting to install Maven dependency helper dependency management

2 after installation, open the POM file, click dependency analyzer

3 select conflicts and click refresh UI to refresh. You can see that guava: version 18.0 appears, which means there is a conflict with this dependency, It’s a repeated introduction,
but the introduction of a problem is to exclude which repeated guava. This problem has been bothering me. My approach is to exclude the displayed dependencies first, and then continue to compile. If not, find other versions of guava for exclusion

4 because there is no exclusion option in Guava in right-click conflicts, Therefore, select jump to left tree to display more clearly

5 exclude 18 versions, re import

6 Click conflicts, and it is found that there is no conflict

7 at that time, the problem that the grpc server cannot be started is also solved