Tag Archives: docker

[Solved] nvidia-docker runtime Error: (Unknown runtime specified nvidia)

1. An error is reported when running the docker command

[email protected]:~# docker run --runtime=nvidia -ti  -v $(pwd):/workspace -w /workspace -v /nfs:/nfs [email protected] --privileged -v /var/run/docker.sock:/var/run/docker.sock registry.test.cn/mla/cxx_toolchains:latest
docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

According to the error prompt, check whether NVIDIA-docker is installed

[email protected]:~# nvidia-docker
nvidia-docker: command not found
[email protected]:~# 

Obviously, it is not installed

2 execute the script and install NVIDIA-docker

[email protected]:~# cat install-nvidia-docker.sh
sudo curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
sudo curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
[email protected]:~#

Check that NVIDIA-docker and NVIDIA-container-Runtim are installed successfully

[email protected]:~# which nvidia-docker
/usr/bin/nvidia-docker
[email protected]:~# which nvidia-container-runtime
/usr/bin/nvidia-container-runtime
[email protected]:~#

3 edit /etc/docker/daemon.JSON is as follows

[email protected]:~# cat /etc/docker/daemon.json
{
  "insecure-registries": ["registry.test.cn"],
  "max-concurrent-downloads": 10,
  "log-driver": "json-file",
  "log-level": "warn",
  "log-opts": {
    "max-size": "10m",
    "max-file": "3"
  },
  "live-restore": true,
  "metrics-addr": "0.0.0.0:9323",
  "default-runtime": "nvidia",
  "experimental": true,
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
[email protected]:~#

4 restart docker

[email protected]:~# systemctl daemon-reload
[email protected]:~# systemctl restart docker

5 verification

[email protected]:~# docker run --runtime=nvidia -ti  -v $(pwd):/workspace -w /workspace -v /nfs:/nfs [email protected] --privileged -v /var/run/docker.sock:/var/run/docker.sock registry.test.cn/mla/cxx_toolchains:latest
[email protected]:/workspace#
[email protected]:/workspace# ls
[email protected]:/workspace# pwd
/workspace
[email protected]:/workspace#

Rancher application service error: request entity too large

When request entity too large occurs, it is because the transport stream exceeds 1m.

1. It is necessary to set parameters in ingress of rancher.

Configuration note: nginx.ingress.kubernetes.io/proxy-body-size

2. Springboot 2.0 adds configuration to the configuration file

spring.servlet.multipart.max-file-size=1024MB
spring.servlet.multipart.max-request-size=1024MB

[Errno 14] curl#6 – “Could not resolve host: yum.dockerproject.org; Unknown error“

1. Commands:

sudo yum install docker-ce docker-ce-cli containerd.ioLoaded plugins: fastestmirror

2. Error Messages:

Loading mirror speeds from cached hostfile
base                                                   | 3.6 kB     00:00
docker-ce-stable                                                                                                                                                                                                          | 3.5 kB  00:00:00
docker-ce-test                                                                                                                                                                                                            | 3.5 kB  00:00:00
https://yum.dockerproject.org/repo/main/centos/7/repodata/repomd.xml: [Errno 14] curl#6 – “Could not resolve host: yum.dockerproject.org; Unknown error”
Trying other mirror.

One of the configured repositories failed (Docker Repository),
and yum doesn’t have enough cached data to continue. At this point the only
safe thing yum can do is fail. There are a few ways to work “fix” this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working
upstream. This is most often useful if you are using a newer
distribution release than is supported by the repository (and the
packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled
yum –disablerepo=dockerrepo …
4. Disable the repository permanently, so yum won’t use it by default. Yum
will then just ignore the repository until you permanently enable it
again or use –enablerepo for temporary usage:
yum-config-manager –disable dockerrepo
or
subscription-manager repos –disable=dockerrepo
5. Configure the failing repository to be skipped, if it is unavailable.
Note that yum will try to contact the repo. when it runs most commands,
so will have to try and fail each time (and thus. yum will be be much
slower). If it is a very temporary problem though, this is often a nice
compromise:
yum-config-manager –save –setopt=dockerrepo.skip_if_unavailable=true
failure: repodata/repomd.xml from dockerrepo: [Errno 256] No more mirrors to try.
https://yum.dockerproject.org/repo/main/centos/7/repodata/repomd.xml: [Errno 14] curl#6 – “Could not resolve host: yum.dockerproject.org; Unknown error”

3. Solution:

yum-config-manager –disable dockerrepo

[Solved] docker Error: System has not been booted with systemd as init system (PID 1). Can‘t operate. Failed to con

Environment centos7 eight

The docker container reported an error using the systemctl command:

[[email protected] yum.repos.d]# systemctl status firewalld
System has not been booted with systemd as init system (PID 1). Can't operate.
Failed to connect to bus: Host is down

Solution:

Add the parameter — privileged when starting the container

[[email protected] ~]# docker run -itd --name c8 --privileged centos /usr/sbin/init
6a6a3c9f9fa9acc59d62a6e82ccb6a637db8aada004aa8a096c6061108c6b144
[[email protected] ~]# docker exec -it c8 /bin/bash

[Solved] K8s Initialize Error: failed with error: Get “http://localhost:10248/healthz“

Environmental description

Server: CentOS 7
docker: 20.10 12
kubeadm:v1. 23.1
Kubernetes:v1. twenty-three point one

Exception description

After docker and k8s related components are installed, there is a problem when executing kubedm init initializing the master node
execute the statement

kubeadm init \
--apiserver-advertise-address=Server_IP \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version=v1.23.1 \
--pod-network-cidr=10.244.0.0/16 \
--service-cidr=10.96.0.0/12 

Error reporting exception

[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

According to the prompt following the error, you can use journalctl -XEU kubelet or journalctl -XEU kubelet -L to view the detailed error information. If you can’t see it completely, you can directly use the direction keys to adjust the error information.

This is

[[email protected] ~]# journalctl -xeu kubelet
Dec 24 20:24:13 k8s-node01 kubelet[9127]: I1224 20:24:13.456712    9127 cni.go:240] "Unable to update cni config" err="no 
Dec 24 20:24:13 k8s-node01 kubelet[9127]: I1224 20:24:13.476156    9127 docker_service.go:264] "Docker Info" dockerInfo=&{
Dec 24 20:24:13 k8s-node01 kubelet[9127]: E1224 20:24:13.476236    9127 server.go:302] "Failed to run kubelet" err="failed
Dec 24 20:24:13 k8s-node01 systemd[1]: kubelet.service: main process exited, code=exited, status=1/FAILURE
Dec 24 20:24:13 k8s-node01 systemd[1]: Unit kubelet.service entered failed state.
Dec 24 20:24:13 k8s-node01 systemd[1]: kubelet.service failed.

Move the direction key to the right to view the details of the fourth line

ID:ZYIL:OO24:BWLY:DTTB:TDKT:D3MZ:YGJ4:3ZOU:7DDY:YYPQ:DPWM:ERFV Containers:0 ContainersRunning:0 ContainersPaused:0 Contain
 to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\"

Error reporting reason

In fact, according to the above error information, it is caused by the inconsistency between k8s and docker’s CGroup driver
k8s is SYSTEMd, while docker is cgroupfs
Yes

docker info

Check CGroup driver: SYSTEMd or cgroupfs are displayed. K8s defaults to cgroupfs

Solution:

Modify the cgroup driver of docker to systemd
edit the configuration file of docker, and create it if it does not exist

vi /etc/docker/daemon.json

Modified to

{
…
“exec-opts”: [“native.cgroupdriver=systemd”]
…
}

Then restart Dockers

systemctl restart docker 

Re kubedm init

[Solved] docker Start jar package and Set JVM parameter Error

Error Messages:
Unrecognized option: -server -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -Xms512m -Xmx1024m -Xmn512m -Xss256k -XX:SurvivorRatio=8 -XX:+UseConcMarkSweepGC -Dtask=true
Error: Could not create the Java Virtual Machine.
Background:
Setting JVM parameters and then using docker file to start jar package reported an error.
Solution:
Use the ENTRYPOINT exec command.
ENV jvm_opts=”-server -XX:MetaspaceSize=256m -XX:MaxMetaspaceSize=256m -Xms512m -Xmx1024m -Xmn512m -Xss256k -XX:S
urvivorRatio=8 -XX:+UseConcMarkSweepGC -Dtask=true”

ENTRYPOINT exec java -jar $jvm_opts trade-chat.jar $app_arg

[Solved] Gunicorn timeout error: worker timeout

Gunicorn timeout error: worker timeout

I. Problem Description:

One morning, the developer suddenly reported a failure and the container restarted inexplicably. After checking the business container log, the worker timeout field was found

II. Analysis of error reporting reasons:

It can be seen from the error message that the worker process of gunicorn timed out, causing the process to exit and restart. Check the official website. The official website explains that the default timeout of gunicorn is 30s. If it exceeds 30s, the worker process will be killed and restarted.

III Solution:

Add: -- timeout 600 to gunicorn’s startup command to set the timeout to 600 seconds– Graceful timeout 600 indicates that the graceful timeout is 600 seconds

After the setting is completed, it is verified through kustomize inspection and re-distribution. It is found that the problem does not occur in the follow-up

[Solved] Docker failed to delete image error: no such image: CentOS

The docker image cannot be deleted. If you view the image through docker images, it clearly exists, but it cannot be deleted.

Deletion prompt: error: no such image: XXXXXXXX

The specific screenshots are as follows:

Problem solving:

get into

cd /var/lib/docker/image/overlay2/imagedb/content/sha256

This directory is all the image files in docker

Which to delete? Don’t panic, image ID in docker images can determine the image file.

Delete the file after confirmation: RM -rf + file name

After deletion, there is no in the docker images list.