Tag Archives: container

How to Solve K8S Error getting node

During the installation or operation of k8s cluster, you may encounter problems of "error getting node", such as:

"Error getting node" err="node \"master\" not found"
dial tcp 10.8.126.46:6443: connect: connection refused"
"Error getting node" err="node \"master\" not found"
"Error getting node" err="node \"master\" not found"

The way to troubleshoot such problems is to execute the following commands to check the specific error causes:

journalctl -xeu kubelet

Find the initial error and deal with it according to different errors
according to the problems I have encountered, there are mainly the following possibilities:

  1. No swap memory disabled
  2. There is a problem with hostname setting or hosts setting (other bloggers listed reasons)
  3. The container and k8s version is not compatible (other bloggers listed reasons)

Failed to remove multipath map 320b508ca45022b80 [How to Solve]

Failed to remove multipath map 320b508ca45022b80

1. Project scenario

Host os:kylin-server-10-sp1-release-build02-20210518-arm64
docker:docker-ce-18.09.7
cloud: openstack queens
storage: same acs5000
VM os: kylin-server-10-sp1-release-build02-20210518-arm64


2. Problem description and cause analysis

2.1 problem description

The volume-based virtual machine can be created normally, but the error after restarting the virtual machine, checking the logs of nova-compute, found that it reports ProcessExecutionError:unexpected error while running command. command: multipah -f 320b508ca45022b80 failed, map in use, failed to remove multipath map 320b508ca45022b80.
I manually executed multipah -f 320b508ca45022b80, and it did report the status of in use, so I suspect that there are processes using the volume, and I found that the same volume group name was activated through lvdisplay, vgdisplay and lsblk, so I suspect that the virtual machine and the physical machine used the same volume group name, and the volume group name was activated after the virtual machine started. The VM has been activated, and the process of reactivating all logical volumes in the volume group failed, resulting in multipath -f failure. Therefore, we need to configure lvm to activate only the logical volumes of the system, check the system volumes by lsblk, and then configure accordingly, edit /etc/lvm/lvm.conf and modify the following content

devices {
        filter = [ "a/sda/", "r/.*/" ]
}
allocation {
       volume_list = ["klas"]
       auto_activation_volume_list = ["klas"]
}

Restart service:

systemctl restart lvm2-lvmetad.service lvm2-lvmetad.socket

Re create the virtual machine and restart it. It is also recommended that the virtual machine adopt other volume group names

2.2 storage configuration

2.2.1 drive

Use the same driver version zeus-driver-3.1.2.000106, copy the driver to the cinder_volume container /usr/lib/python2.7/site-packages/cinder/volume/drivers/ directory and the cinder_backup container /usr/lib/python2.7/site-packages/cinder/backup/drivers/ directory, and restart the related services.

2.2.2 configure cinder volume

vim /etc/kolla/cinder-volume/cinder.conf

[DEFAULT]
enabled_backends=toyou_ssd
[toyou_ssd]
volume_driver = cinder.volume.drivers.zeus.Acs5000_iscsi.Acs5000ISCSIDriver
san_ip = x.x.x.x
use_mutipath_for_image_xfer = True
image_volume_cache_enabled = True
san_login = cliuser
san_password = ******
acs5000_volpool_name = toyou_ssd
acs5000_target = 0
volume_backend_name = toyou_ssd

Restart the cinder-volume service. For others, please refer to the “reference scheme”


3. Solutions

View the adopted system disk through lsblk, and then edit /etc/lvm/lvm.conf to modify the following contents

devices {
        filter = [ "a/sda/", "r/.*/" ]
}
allocation {
       volume_list = ["klas"]
       auto_activation_volume_list = ["klas"]
}

Restart service:

systemctl restart lvm2-lvmetad.service lvm2-lvmetad.socket

Note that it is mainly the filter. The drive letter in the filter is determined by the system disk recognized by lsblk, which may be SDB or nvme, etc

[Solved] onlyoffice Error: error self signed certificate and download failed

When Installing nextcloud+onlyoffice, onlyoffice failed to start and report an error:


enter the container to see the error information of out.log

[[email protected] ~]# docker ps -a
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
CONTAINER ID  IMAGE                                       COMMAND     CREATED       STATUS           PORTS                                        NAMES
a7c97fb93556  docker.io/onlyoffice/documentserver:latest              30 hours ago  Up 30 hours ago  0.0.0.0:8080->80/tcp, 0.0.0.0:9000->443/tcp  onlyoffice
[[email protected] ~]# docker exec -it a7c97fb93556 /bin/bash
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
[email protected]:/# cd /var/log/onlyoffice/documentserver/converter/
[email protected]:/var/log/onlyoffice/documentserver/converter# ls
err.log  out.log-20220729
[email protected]:/var/log/onlyoffice/documentserver/converter#

Disabling Document Server Access Authentication
Next, disable access authentication for Document Server, which by default rejects unauthenticated requests (i.e., self-signed HTTPS requests).

I am now running Document Server with Docker, using the docker exec command to log into the container.

There seems to be only the nano editor in the container, but that’s enough.

Open /etc/onlyoffice/documentserver/default.json, go down and find the rejectUnauthorized field and change its value to false.

Restart the container.
Modify default.json

[email protected]:/var/log/onlyoffice/documentserver/converter# cd /etc/onlyoffice/
[email protected]:/etc/onlyoffice# ls
documentserver  documentserver-example
[email protected]:/etc/onlyoffice# cd documentserver
[email protected]:/etc/onlyoffice/documentserver# ls
default.json              local.json  production-linux.json
development-linux.json    log4js      production-windows.json
development-mac.json      logrotate   supervisor
development-windows.json  nginx
[email protected]:/etc/onlyoffice/documentserver# pwd
/etc/onlyoffice/documentserver
[email protected]:/etc/onlyoffice/documentserver#

Modify as follows: “rejectunauthorized”: false

                     "requestDefaults": {
                                "headers": {
                                        "User-Agent": "Node.js/6.13",
                                        "Connection": "Keep-Alive"
                                },
                                "gzip": true,
                                "rejectUnauthorized": false
                        },

Restart container

[email protected]:/etc/onlyoffice/documentserver# exit
exit
[[email protected] ~]# docker stop a7c97fb93556
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Error: given PIDs did not die within timeout
[[email protected] ~]# docker start a7c97fb93556
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Error: unable to start container "a7c97fb93556650c83dd763f9578705a82f34b2673f9759e8d0ce62afc63e77c": container a7c97fb93556650c83dd763f9578705a82f34b2673f9759e8d0ce62afc63e77c must be in Created or Stopped state to be started: container state improper
[[email protected] ~]# reboot

Restart nextcloud

login as: root
[email protected]'s password:
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Fri Jul 29 15:59:59 2022 from 192.168.182.1
[[email protected] ~]# setenforce 0
[[email protected] ~]# systemctl start https
Failed to start https.service: Unit https.service not found.
[[email protected] ~]# systemctl start httpd
Enter TLS private key passphrase for localhost:443 (RSA) : ******
[[email protected] ~]# docker ps -a
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
CONTAINER ID  IMAGE                                       COMMAND     CREATED       STATUS      PORTS                                        NAMES
a7c97fb93556  docker.io/onlyoffice/documentserver:latest              31 hours ago  Created     0.0.0.0:8080->80/tcp, 0.0.0.0:9000->443/tcp  onlyoffice
[[email protected] ~]# docker start a7c97fb93556
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
a7c97fb93556
[[email protected] ~]#

Start onlyoffice

run as prompted

[[email protected] ~]# sudo docker exec a7c97fb93556 sudo supervisorctl start ds:example
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
sudo: unable to send audit message: Operation not permitted
ds:example: started

Successfully opened word document

[Solved] docker skywalking error: no provider found for module storage

When I use docker to deploy skywalking, I always report an error: no provider found for module storage

Details are as follows:

Conditions:

  1. skywalking 9.1
  2. elasticsearch 7

Execute command:

docker run --name skywalking-oap --restart always -d \
-p 12800:12800 \
-p 11800:11800 \
--link es7:es7 \
-e SW_STORAGE=elasticsearch7 \
-e SW_STORAGE_ES_CLUSTER_NODES=es7:9200 \
skywalking-oap-server

report errors:

no provider found for module storage

Solution:

Modify

docker run --name skywalking-oap --restart always -d \
-p 12800:12800 \
-p 11800:11800 \
--link es7:es7 \
-e SW_STORAGE=elasticsearch7 \
-e SW_STORAGE_ES_CLUSTER_NODES=es7:9200 \
skywalking-oap-server

to

docker run --name skywalking-oap --restart always -d \
-p 12800:12800 \
-p 11800:11800 \
--link es7:es7 \
-e SW_STORAGE=elasticsearch \
-e SW_STORAGE_ES_CLUSTER_NODES=es7:9200 \
skywalking-oap-server

Modify sw_ Storage=elasticsearch7 to sw_ STORAGE=elasticsearch

Cause analysis:

  1. Before skywalking 8.8, it cannot automatically sense what the storage source is, and you need to manually specify whether it is es6 or 7;
  2. After 8.8, the version of the storage source can be automatically sensed, and there is no need to manually specify es6 or 7, just write es directly;

 

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[error systemverification]: failed to parse kernel config: unable to load kernel module: “configs”.

When installing kubernetes cluster, the above error is reported.

 

Solution:

Method 1: ignore the error

Add the –ignore-preflight-errors=SystemVerification option to ignore the error. It is not possible to tell if other problems will occur subsequently with this option.

Method 2: Upgrade kernel version

I installed the kubernetes cluster using kernel version 4.19.12, and the problem did not occur after upgrading the kernel to 5.13.7. I am not sure if it is a kernel version problem.

Method 3:

Manually compile the config kernel module

 

[Solved] Error response from daemon: driver failed programming external connectivity on endpoint mysql

Error response from daemon: driver failed programming external connectivity on endpoint mysql

docker command:
docker start container_name/id
Container Start Error:

Error response from daemon: driver failed programming external connectivity on endpoint mysql (cf1ba9f9e0613e14f42332d187a51429f8213aaf91d775f2ec3600614c78e6e1): (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 3306 -j DNAT --to-destination 172.17.0.2:3306 ! -i docker0: iptables: No chain/target/match by that name.
(exit status 1))
Error: failed to start containers: mysql

 

Solution: restart docker:systemctl restart docker

https://blog.csdn.net/qq_45652428/article/details/124870923

[Solved] Error from server (InternalError): error when creating “ingress.yaml”: Internal error occurred: fail

When using the ingress exposure service, kubectl apply -f ingress.yaml reports the following error.
Reported error:

Error from server (InternalError): error when creating “ingress.yaml”: Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: failed to call webhook: Post “https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s”: x509: certificate has expired or is not yet valid: current time 2022-03-26T14:45:34Z is before 2022-03-26T20:16:32Z

 

Solution:
Check kubectl apply -f ingress.yaml

kubectl get validatingwebhookconfigurations

Delete ingress-nginx-admission

kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission

Then execute

kubectl apply -f ingress.yaml 

stream copy error: reading from a closed fifo [How to Solve]

The docker service on the Linux server cannot be started after running for a period of time

Record a docker service problem encountered at the customer’s site

Problem description

After running the docker service for a period of time, some services are killed and cannot be restarted successfully through docker compose. Check the docker service log and report an error stream copy error: reading from a closed FIFO

Troubleshooting process:

1. The initial positioning is that there is not enough memory. Check with Free -g and find that the content is enough

2 After searching the Internet, some bloggers said that restarting docker could solve the problem. After restarting docker, they found that the error changed, and the service that didn’t get up before still couldn’t get up. The error became failed to allocate network resources for node *****
3 The network of docker service is the default. Regardless of the problem of docker network, changing the stack name and restarting still won’t work
4 Docker service PS ID/docker service logs ID check the service log that failed to start. It is found that the error log is still stream copy error: reading from a closed FIFO
5 Finally, before restarting the server, check the disk with the df -h command and find that the/dev/mapper/centosroot disk is full

Solution:

Go to cd /var/log to delete some useless log files, if the current log files are small, you can use du -sh in the root directory to view those folders occupy a lot of space, generally the /var folder and /root folder will occupy the root disk, you need to delete the contents of these two folders

[Solved] error during connect: This error may indicate that the docker daemon is not running

Because the shortcut key of my screenshot tool is Ctrl+q , and the shortcut key of docker desktop exit is also Ctrl+q, when I press Ctrl+q, docker desktop exits, and then when I enter the docker command in the console,

burst this line of error

error during connect: This error may indicate that the docker daemon is not running.: Get "http://%2F%2F.%2Fpipe%2Fdocker_engine/v1.24/containers/json": open //./pipe/docker_engine: The system cannot find the file specified.

Solution:
reopen docker desktop

when the color of the icon in the lower-left corner is the same as that shown in the picture, it proves that docker operates normally
then I go to CMD and enter the docker command

to see that there is no error

[Solved] docker Commands Execute Error: Segmentation fault

If you execute any docker command, you will report an error segmentation fault. There have been no similar errors when using docker before. After troubleshooting, it was found that the available memory was only 110m. It was speculated that the memory was not enough, so the command to clean the memory was executed, but the parameters were changed to 1, 2 and 3, which could not clean the memory.

sync
echo 1 > /proc/sys/vm/drop_caches

The solution is found on GitHub. First enter

sysctl vm.overcommit_memory

The output is 0, and then change the parameters

sysctl vm.overcommit_memory=1

At this time, the application that occupies a lot of memory has been restarted automatically. If not, execute the above cleaning command.

[Solved] nvidia-docker runtime Error: (Unknown runtime specified nvidia)

1. An error is reported when running the docker command

[email protected]:~# docker run --runtime=nvidia -ti  -v $(pwd):/workspace -w /workspace -v /nfs:/nfs [email protected] --privileged -v /var/run/docker.sock:/var/run/docker.sock registry.test.cn/mla/cxx_toolchains:latest
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

According to the error prompt, check whether NVIDIA-docker is installed

[email protected]:~# nvidia-docker
nvidia-docker: command not found
[email protected]:~# 

Obviously, it is not installed

2 execute the script and install NVIDIA-docker

[email protected]:~# cat install-nvidia-docker.sh
sudo curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
sudo curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
[email protected]:~#

Check that NVIDIA-docker and NVIDIA-container-Runtim are installed successfully

[email protected]:~# which nvidia-docker
/usr/bin/nvidia-docker
[email protected]:~# which nvidia-container-runtime
/usr/bin/nvidia-container-runtime
[email protected]:~#

3 edit /etc/docker/daemon.JSON is as follows

[email protected]:~# cat /etc/docker/daemon.json
{
  "insecure-registries": ["registry.test.cn"],
  "max-concurrent-downloads": 10,
  "log-driver": "json-file",
  "log-level": "warn",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "live-restore": true,
  "metrics-addr": "0.0.0.0:9323",
  "default-runtime": "nvidia",
  "experimental": true,
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
[email protected]:~#

4 restart docker

[email protected]:~# systemctl daemon-reload
[email protected]:~# systemctl restart docker

5 verification

[email protected]:~# docker run --runtime=nvidia -ti  -v $(pwd):/workspace -w /workspace -v /nfs:/nfs [email protected] --privileged -v /var/run/docker.sock:/var/run/docker.sock registry.test.cn/mla/cxx_toolchains:latest
[email protected]:/workspace#
[email protected]:/workspace# ls
[email protected]:/workspace# pwd
/workspace
[email protected]:/workspace#