Tag Archives: k8s

[How to Modify] etcd-server-8-12: ERROR (spawn error)

My problem is here

 vi etcd-server-startup.sh

#This is wrong

[program:etcd-server-7-12]
command=/opt/etcd/etcd-server-startup.sh              ; the program (relative uses PATH, can take args)
numprocs=1                                            ; number of processes copies to start (def 1)
directory=/opt/etcd                                   ; directory to cwd to before exec (def no cwd)
autostart=true                                        ; start at supervisord start (default: true)
autorestart=true                                      ; retstart at unexpected quit (default: true)
startsecs=30                                          ; number of secs prog must stay running (def. 1)
startretries=3                                        ; max # of serial start failures (default 3)
exitcodes=0,2                                         ; 'expected' exit codes for process (default 0,2)
stopsignal=QUIT                                       ; signal used to kill process (default TERM)
stopwaitsecs=10                                       ; max num secs to wait b4 SIGKILL (default 10)
user=etcd                                             ; setuid to this UNIX account to run the program
redirect_stderr=true                                  ; redirect proc stderr to stdout (default false)
stdout_logfile=/data/logs/etcd-server/etcd.stdout.log ; stdout log path, NONE for none; default AUTO
stdout_logfile_maxbytes=64MB                          ; max # logfile bytes b4 rotation (default 50MB)
stdout_logfile_backups=5                              ; # of stdout logfile backups (default 10)
stdout_capture_maxbytes=1MB                           ; number of bytes in 'capturemode' (default 0)
stdout_events_enabled=false                           ; emit events on stdout writes (default false)


#Right
```bash
#!/bin/sh
./etcd --name etcd-server-8-12 \
    --data-dir /data/etcd/etcd-server \
    --listen-peer-urls https://192.168.118.12:2380 \
    --listen-client-urls https://192.168.118.12:2379,http://127.0.0.1:2379 \
    --quota-backend-bytes 8000000000 \
    --initial-advertise-peer-urls https://192.168.118.12:2380 \
    --advertise-client-urls https://192.168.118.12:2379,http://127.0.0.1:2379 \
    --initial-cluster  etcd-server-8-12=https://192.168.118.12:2380,etcd-server-8-21=https://192.168.118.21:2380,etcd-server-8-22=https://192.168.22:2380 \
    --ca-file ./certs/ca.pem \
    --cert-file ./certs/etcd-peer.pem \
    --key-file ./certs/etcd-peer-key.pem \
    --client-cert-auth  \
    --trusted-ca-file ./certs/ca.pem \
    --peer-ca-file ./certs/ca.pem \
    --peer-cert-file ./certs/etcd-peer.pem \
    --peer-key-file ./certs/etcd-peer-key.pem \
    --peer-client-cert-auth \
    --peer-trusted-ca-file ./certs/ca.pem \
    --log-output stdout
~                                                                                                                             
~

Etcd start stop command

 ~]# supervisorctl start etcd-server-7-12
 ~]# supervisorctl stop etcd-server-7-12
 ~]# supervisorctl restart etcd-server-7-12
 ~]# supervisorctl status etcd-server-7-12

[Solved] K8s Error: ERROR: Unable to access datastore to query node configuration

K8s start Service Kube calico start reports the following error

Skipping datastore connection test
ERROR: Unable to access datastore to query node configuration
Terminating
Calico node failed to start

Solution:

The main reason for this problem is that the firewall of the primary node is turned on. Just turn off the firewall of the primary node (that is, the server where etcd is installed)

systemctl stop firewalld

Other possible errors

1. The address of etcd in calico configuration file is written incorrectly
2. Etcd service is not started

K8s initializing the master & worker node error [How to Solve]

Error 1:coredns:1.8.0 Solution

Error 2:bridge-nf-call-iptables does not exist Solution
ip_forward contents are not set to 1Solution

CentOS 7.8

Error 1:

[config/images] Pulled registry.aliyuncs.com/k8sxio/pause:3.2
[config/images] Pulled registry.aliyuncs.com/k8sxio/etcd:3.4.13-0
failed to pull image “swr.cn-east-2.myhuaweicloud.com/coredns:1.8.0”:
output: time=“2021-04-30T13:26:14+08:00” level=fatal msg=“pulling
image failed: rpc error: code = NotFound desc = failed to pull and
unpack image “swr.cn-east-2.myhuaweicloud.com/coredns:1.8.0”:
failed to resolve reference
“swr.cn-east-2.myhuaweicloud.com/coredns:1.8.0”:
swr.cn-east-2.myhuaweicloud.com/coredns:1.8.0: not found”, error: exit
status 1 To see the stack trace of this error execute with –v=5 or
higher

coredns:1.8.0
failed to resolve reference \"swr.cn-east-2.myhuaweicloud.com/coredns:1.8.0\"
Solution:

curl -sSL https://kuboard.cn/install-script/v1.20.x/init_master.sh | sh -s 1.20.6 /coredns

Error 2:

[config/images] Pulled
swr.cn-east-2.myhuaweicloud.com/coredns/coredns:1.8.0
Initialize Master Node [init] Using Kubernetes version: v1.20.6 [preflight]
Running pre-flight checks error execution phase preflight: [preflight]
Some fatal errors occurred: [ERROR
FileContent–proc-sys-net-bridge-bridge-nf-call-iptables]:
/proc/sys/net/bridge/bridge-nf-call-iptables does not exist [ERROR
FileContent–proc-sys-net-ipv4-ip_forward]:
/proc/sys/net/ipv4/ip_forward contents are not set to 1

bridge-nf-call-iptables does not exist
[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
Solution:
Execute in root：

modprobe br_netfilter

ip_forward contents are not set to 1
[ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
Solution:
ip_forward contents is not set to 1. View contents.

cat /proc/sys/net/ipv4/ip_forward
0

0 means prohibit
1 means forward
Modify

echo "1" > /proc/sys/net/ipv4/ip_forward

[Solved] Upstream connect error or disconnect occurs after the k8s istio virtual machine is restarted

Error (common problem after virtual machine reboot):
upstream connect error or disconnect/reset before headers. reset reason: connection failure, transport failure reason: TLS error: 268436501:SSL routines:OPENSSL_internal:SSLV3_ALERT_CERTIFICATE_EXPIRED

Solution:
The pods under istio-system are out of order, you need kubectl delete pods xxx -n istio-system to restart

Apiserver Error: OpenAPI spec does not exists [How to Solve]

**
kubectl suddenly failed to obtain resources in the environment just deployed a few days ago. Check the apiserver log, as shown in the above results
then the controller manager component also reports an error

E0916 08:35:55.495444       1 leaderelection.go:306] error retrieving resource lock kube-system/kube-controller-manager: Get https://192.168.1.119:8443/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s: EOF

This environment adopts three master deployments. Then I see that VIP drifts to the master node. Both haproxy and keepalived here are container deployments. Then I restart haproxy first

docker restart xxx
iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 8443 -j DNAT
 --to-destination 172.18.0.6:8443 ! -i docker0 iptables: No chain/target/match by that name

Haproxy doesn’t get up, so I guess there is something wrong with the iptables rules of the master 3 host. Then I restart it. Kept
at this time, the VIP drifts to master 1. At this time, kubectl can obtain resources

Finally, I restarted the docker component of Master 3

systemctl restatrt docker

Then manually drift the VIP to master3, and it is normal at this time

[Solved] Error from server (InternalError): an error on the server (““) has prevented the request from suc

Phenomenon

This error occurs when executing any k8s command

[root@master helm]# kubectl get pod
Error from server (InternalError): an error on the server ("") has prevented the request from succeeding

reason

To contact apiserver, you need to visit: https://master:6443

The current command is executed directly on the control plane, so it is necessary to ensure that there is no problem in the resolution of Master

# Since the http proxy was previously configured on the local machine
export http_proxy='http://10.0.0.1:8889'
export https_proxy='http://10.0.0.1:8889'

# and
[root@master helm]# hostname
master

Therefore, the resolution of Master is local and will be proxy to http://10.0.0.1:8889

Therefore, the apiserver cannot be accessed normally

Solution:

Method 1: cancel the agent

unset http_proxy
unset https_proxy

Method 2: do not proxy this machine

export no_proxy='master,localhost,127.0.0.1,localaddress,.localdomain.com'

Note: if it is not restored after modification, you can try to restart the kubelet service of the node.

[Solved] Error response from daemon: Get “*“: x509: certificate signed by unknown authority

Environmental description

The harbor repository I built requires a domain name and https access

Error 1 is reported. When other docker environments log in to harbor

[root@k8s0001 ~]# docker login www.harbor.wuhan.cn
Username: admin
Password: 
Error response from daemon: Get "https://www.harbor.wuhan.cn/v2/": x509: certificate signed by unknown authority

Error 2: when other docker environments pull self built harbor warehouse images

[root@k8s0001 ~]# docker pull www.harbor.wuhan.cn/22202/helloworld@sha256:0d9ce49958ea82a48c40a397ccc785674ec3ce1dfd4f749c3c7c7a63790a54cd
Error response from daemon: Get "https://www.harbor.wuhan.cn/v2/": x509: certificate signed by unknown authority

For these two errors, you need to transfer the generated key CP to the configuration file directory of the corresponding machine docker. The operations are as follows:

Configure HTTPS links

##harbor
[root@harbor opt]# cd /etc/docker/
[root@harbor docker]# ls
certs.d  key.json
[root@harbor docker]# cd certs.d/
[root@harbor certs.d]# ls
www.harbor.wuhan.cn
[root@harbor certs.d]# cd www.harbor.wuhan.cn/
[root@harbor www.harbor.wuhan.cn]# ls
ca.crt  www.harbor.wuhan.cn.cert  www.harbor.wuhan.cn.key
[root@harbor certs.d]# cd ..
[root@harbor certs.d]# scp -r www.harbor.wuhan.cn [email protected]:/etc/docker/certs.d/ 
[email protected]'s password: 
www.harbor.wuhan.cn.cert                                                                          100% 2126   914.9KB/s   00:00    
www.harbor.wuhan.cn.key                                                                           100% 3243     1.5MB/s   00:00    
ca.crt                                                                                             100% 2033   839.2KB/s   00:00    
[root@harbor certs.d]# 
[root@harbor certs.d]# scp -r www.harbor.wuhan.cn [email protected]:/etc/docker/certs.d/
[email protected]'s password: 
www.harbor.wuhan.cn.cert                                                                          100% 2126   845.3KB/s   00:00    
www.harbor.wuhan.cn.key                                                                           100% 3243     1.9MB/s   00:00    
ca.crt                                                                                             100% 2033     1.8MB/s   00:00    
[root@harbor certs.d]# 
[root@harbor certs.d]# 
[root@harbor certs.d]# scp -r www.harbor.wuhan.cn [email protected]:/etc/docker/certs.d/
[email protected]'s password: 
www.harbor.wuhan.cn.cert                                                                          100% 2126   227.8KB/s   00:00    
www.harbor.wuhan.cn.key                                                                           100% 3243     2.5MB/s   00:00    
ca.crt                                                                                             100% 2033     1.2MB/s   00:00

Then restart the docker

[root@k8s0001 opt]# systemctl restart docker
[root@k8s0002 opt]# systemctl restart docker
[root@k8s0003 opt]# systemctl restart docker

Docker Build Error: Failed to get D-Bus connection: Operation not permitted [Solved]

After using CentOS 7 image to create a container, you may encounter such a problem that you use systemctl to start the service and report an error. For this error report, let’s analyze it next!

# docker run -itd –name centos7 centos:7

# docker attach centos7

# yum install vsftpd

# systemctl start vsftpd

Failed to get D-Bus connection: Operation not permitted

The reasons are as follows:

The design concept of docker is that there is no background service running in the container. The container itself is an independent main process on the host, which can also be indirectly understood as the application process running the service in the container. The life cycle of a container exists around the main process, so the correct way to use the container is to run the services in the foreground.

When it comes to SYSTEMd, this suite has become the default service management for mainstream Linux distributions (such as centos7 and Ubuntu 14 +), replacing the traditional Systemv style service management. SYSTEMd maintains system services, which require privileges to access the Linux kernel. The container is not a complete operating system, there is only one file system, and the default startup is only for ordinary users to access the Linux kernel, that is, there is no privilege, so it can’t be used naturally!

Therefore, please follow the container design principles and run a foreground service in one container!!!

Yes, run the container in privileged mode.

The solution is as follows:

Create container:

# docker run -d -name centos7 –privileged=true centos:7 /usr/sbin/init

Enter container:

# docker exec -it centos7 /bin/bash

This allows you to start the service using systemctl

[Solved] Kubeadm join Timeout error execution phase kubelet-start: error uploading crisocket: timed out waiting

solve:

swapoff -a
kubeadm reset
systemctl daemon-reload
systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X

Initializing the Kubernetes master node ERROR: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0

run kubeadm init –config=kubeadm.yml –upload-certs | tee kubeadm-init.log command error: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0
full error message as below:

kubeadm init --config=kubeadm.yml --upload-certs | tee kubeadm-init.log

[init] Using Kubernetes version: v1.21.2
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING Hostname]: hostname "node" could not be reached
        [WARNING Hostname]: hostname "node": lookup node on 127.0.0.53:53: server misbehaving
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR ImagePull]: failed to pull image registry.aliyuncs.com/google_containers/coredns:v1.8.0: output: Error response from daemon: manifest for registry.aliyuncs.com/google_containers/coredns:v1.8.0 not found: manifest unknown: manifest unknown
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher

Prompt to pull registry.aliyuncs.com/google_ Containers/coredns: v1.8.0 this image failed
use kubedm config images list — config kubedm.yml to query the image to be downloaded

kubeadm config images list --config kubeadm.yml

registry.aliyuncs.com/google_containers/kube-apiserver:v1.21.2
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.21.2
registry.aliyuncs.com/google_containers/kube-scheduler:v1.21.2
registry.aliyuncs.com/google_containers/kube-proxy:v1.21.2
registry.aliyuncs.com/google_containers/pause:3.4.1
registry.aliyuncs.com/google_containers/etcd:3.4.13-0
registry.aliyuncs.com/google_containers/coredns:v1.8.0

Use the docker images command to query images

docker images

registry.aliyuncs.com/google_containers/kube-apiserver            v1.21.2    106ff58d4308   3 weeks ago     126MB
registry.aliyuncs.com/google_containers/kube-controller-manager   v1.21.2    ae24db9aa2cc   3 weeks ago     120MB
registry.aliyuncs.com/google_containers/kube-scheduler            v1.21.2    f917b8c8f55b   3 weeks ago     50.6MB
registry.aliyuncs.com/google_containers/kube-proxy                v1.21.2    a6ebd1c1ad98   3 weeks ago     131MB
registry.aliyuncs.com/google_containers/pause                     3.4.1      0f8457a4c2ec   6 months ago    683kB
registry.aliyuncs.com/google_containers/etcd                      3.4.13-0   0369cf4303ff   10 months ago   253MB

It is found that there is no registry.aliyuncs.com/google in the downloaded image_ Containers/coredns: v1.8.0
use the docker command to pull the image

docker pull registry.aliyuncs.com/google_containers/coredns:1.8.0

Kubernetes needs registry.aliyuncs.com/google_ Containers/coredns: v1.8.0, rename the image with the docker tag command

# rename
docker tag registry.aliyuncs.com/google_containers/coredns:1.8.0 registry.aliyuncs.com/google_containers/coredns:v1.8.0
# detele all mirror
docker rmi registry.aliyuncs.com/google_containers/coredns:1.8.0

Run the initialization command again

kubeadm init --config=kubeadm.yml --upload-certs | tee kubeadm-init.log

Prompt success

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Follow the above prompts to configure kubectl

mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

# NOT ROOT 
chown $(id -u):$(id -g) $HOME/.kube/config

Verify success

kubectl get node

# The ability to print out node information indicates success
NAME   STATUS     ROLES                  AGE   VERSION
node   NotReady   control-plane,master   31m   v1.21.2

[Solved] Kubeadm Reset error: etcdserver: re-configuration failed due to not enough started members

Error information:

[root@bogon log]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed?[y/N]: y
[preflight] Running pre-flight checks
[reset] Removing info for node "bogon" from the ConfigMap "kubeadm-config" in the "kube-system" Namespace
{"level":"warn","ts":"2021-07-03T08:19:14.041-0400","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-7295b53f-6c7d-4a5e-8795-ab4b33048049/192.168.28.128:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}
{"level":"warn","ts":"2021-07-03T08:19:14.096-0400","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"endpoint://client-7295b53f-6c7d-4a5e-8795-ab4b33048049/192.168.28.128:2379","attempt":0,"error":"rpc error: code = Unknown desc = etcdserver: re-configuration failed due to not enough started members"}

Solutions:

Execute the following two commands

rm -rf /etc/kubernetes/*
rm -rf /root/.kube/

Then execute it again

kubeadm reset

[Solved] Ubuntu 20.04 LTS Install k8s Error: The connection to the server localhost:8080 was refused

After Ubuntu 20.04 LTS is successfully added to the cluster on the node, the following prompt will appear:

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

Verify the success of adding by kubectl get nodes command. The following error occurs:

The connection to the server localhost:8080 was refused - did you specify the right host or port?

This error has also occurred on the master node. The solution gives the answer after successful installation

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

I use the root account, so I need to execute the following command:

export KUBECONFIG=/etc/kubernetes/admin.conf

After the command is executed, verify it:

root@k8s-master-03:/etc/kubernetes# kubectl get nodes
W0706 10:27:55.181115   22817 loader.go:221] Config not found: /etc/kubernetes/admin.conf
The connection to the server localhost:8080 was refused - did you specify the right host or port?

The key information of the error report is that the configuration is not found

Config not found: /etc/kubernetes/admin.conf

Let’s go to the /etc/kubernetes/ Directory:

root@k8s-master-03:/etc/kubernetes# ls -l
total 12
-rw------- 1 root root 1910 Jul  6 09:52 kubelet.conf
drwxr-xr-x 2 root root 4096 Jul  6 09:41 manifests
drwxr-xr-x 2 root root 4096 Jul  6 09:52 pki

The results show that only kubelet. Conf (master node has admin. CONF), so we need to execute the following command:

export KUBECONFIG=/etc/kubernetes/kubelet.conf

ProgrammerAH

Programmer Guide, Tips and Tutorial