Tag Archives: docker

[Solved] Docker Error: driver failed programming external connectivity on endpoint

1. Error information

Cannot start service nacos: driver failed programming external
connectivity on endpoint yingxue_nacos_1
(3e83b70dcd6ba020d1ee4cf61ffeac58dbf9aea3bbbdad69c7ed44f5cf40ad1a):
(iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0
–dport 8848 -j DNAT --to-destination 172.18.0.2:8848 ! -i br-2e393ccf4803: iptables: No chain/target/match by that name.

2. Solutions

The user-defined chain DOCKER is cleared for some reason when the docker service is started. Restart docker, and then restart naocs

systemctl restart docker
docker restart 540

[Solved] Docker Run Tomcat Error: Cannot find /usr/local/tomcat/bin/setclasspath.sh

Docker reports an error when running Tomcat

Cannot find /usr/local/tomcat/bin/setclasspath.sh
This file is needed to run this program

After trying, use the add command packaged with dockerfile

RUN unset CATALINA_HOME

Invalid after attempt

The prompt is that setclasspath.sh cannot be found, because this Tomcat restarts repeatedly

So use the command to copy a bin directory to the current folder

docker cp Docker_id:/usr/local/tomcat/bin ./

The result shows that setclasspath.sh exists, so where is the problem? Look for the error reporting code

In the catalina.sh script, the part causing the problem is as follows:


  if [ -r "$CATALINA_HOME"/bin/setclasspath.sh ]; then
    . "$CATALINA_HOME"/bin/setclasspath.sh
  else
    echo "Cannot find $CATALINA_HOME/bin/setclasspath.sh"
    echo "This file is needed to run this program"
  fi

The square brackets plus the -r command means to test whether the file is read-only. Similarly, – x tests whether the file is executable

In the problematic system, the – R command call in the container is abnormal.

Try to start a temporary tomcat8 authentication,

docker run -it --rm --entrypoint=/bin/bash tomcat:8

Note: executing the docker run command with the –rm command option is equivalent to executing docker rm -v after the container exits
execute command

root@f338debf92f6:/usr/local/tomcat# [[ -r /bin/bash ]]
root@f338debf92f6:/usr/local/tomcat# echo $?
1

Executed on a normal system

root@0083a80a9ec2:/usr/local/tomcat# [[ -r /bin/bash ]]
root@0083a80a9ec2:/usr/local/tomcat# echo $?
0

The command ‘$?’ indicates the exit status of the previous command or the return value of the function
for exit status, 0 indicates no error, and any other value indicates an error. In general, most commands return 0 upon successful execution and 1 upon failure. Some commands return other values, indicating different types of errors.

How to solve it
this is related to the faccessat2 system call. Due to the bug in runc, if your kernel does not support faccessat2, it will fail. There is an article saying that upgrading the kernel to 5.8 or above may work well, but I have tried it hard, because the kernel with the problem is 5.10
Method 1: update runc >= 1.0.0-rc93

Method 2: — privileged switch to run the container

Specific implementation of method 1:

View the original runc version

runc -v

Download version 1.0.0-rc95 of runc.amd64

Download address: releases · opencontainers/runc · GitHub

Name change and Execution Authority

mv runc.amd64 runc && chmod +x runc

View and back up the original runc

whereis runc

Some systems are in the /usr/bin/ directory, some systems are in the /usr/local/bin/ directory, the following commands are in the /usr/local/bin/ directory, the following commands are in the /usr/bin/ directory, change the directory yourself

mv /usr/local/bin/runc /usr/local/bin/runcbak

Overwrite the original runc

cp runc /usr/local/bin/runc

View the new version of runc

runc -v

Restart docker

systemctl restart docker

Method 2: concrete implementation

Docker run execution

docker run --privileged=true -p 8080:8080 tomcat:8

Add in docker-compose.yml

version: '2'
services:
    tomcat:
     container_name: tomcat
     image: tomcat:8
     ports:
         - '8080:8080'
     environment:
         - TZ=Asia/Shanghai
     privileged: true
     restart: always

[Solved] Fatal error in PMPI_Init: Other MPI error, error stack:MPIR_Init_thread(138)

1. Problems

Use the compiler of onepai 2021.3. After compiling the program, use the slurm scheduling system to run the job and report an error:

2. Solutions

In the sbatch script of slurm, add the environment variable of libpmi2.so

export I_MPI_PMI_LIBRARY=../slurm/lib/libpmi2.so

If the application image is made in a container (such as docker)

1. Copy the libpmi2.so file into the container

2. At the same time, the corresponding environment variables are configured for export

 

[Solved] Compose error “HTTP request took too long to complete“

Recently, I noticed that the following error messages often appear in docker-compose:

ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information.
If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60).

Adding COMPOSE_HTTP_TIMEOUT seems to only delay the error. Is this a known issue? Or is there a workaround?
Tried increasing the COMPOSE_HTTP_TIMEOUT environment variable and it didn’t work.

# Increase timeout period to 120 seconds.
export COMPOSE_HTTP_TIMEOUT=120;
# Rebuild all containers using the new images.
docker-compose up -d;

# or use docker-compose --verbose up -d to check out the errors

My Solution:

sudo service docker restart
docker-compose up

Maybe I have a high inode

df -ih			#  -i, --inodes   List information about information nodes, not data block usage
Filesystem     Inodes IUsed IFree IUse% Mounted on
udev             493K   390  492K    1% /dev
tmpfs            494K   537  494K    1% /run
/dev/xvda1       1.3M  1.2M   70K   95% /
tmpfs            494K     1  494K    1% /dev/shm
tmpfs            494K     3  494K    1% /run/lock
tmpfs            494K    16  494K    1% /sys/fs/cgroup
tmpfs            494K     4  494K    1% /run/user/1000

[Solved] error: password authentication failed for user “postgres”

password authentication failed for user “postgres” with docker

Steps:

    1. 1. run the instruction to create a docker container.

docker run --rm --name test-postgres -p 5432:5432 -e POSTGRES_PASSWORD=pw -d postgres

    1. 2. run the following command in node code to connect to the database:
import pg from 'pg'
const { Pool } = pg
pool = new Pool({
   database: 'postgres',
   user: 'postgres',
   password: 'pw',
   port: 5432
})

The following error is thrown:
error: password authentication failed for user “postgres”

 

Cause analysis

Because Postgres has been installed locally, Postgres is automatically started when the system is started by default. When connecting to the database, the locally installed Postgres service is preferentially connected, so the connection fails.

 

Solution:

Open Window Task Manager, under Services you will see a postgres service running. Right-click to turn off the postgres service, open the Service window and double-click on the postgres service to set the Startup Type to Manual.
then


Other troubleshooting methods

Open the Terminal of the container

  1. Enter the following two commands to see if there is a problem with local host permissions.
    cd var/lib/postgresql/data
    cat pg_hba.conf
  2. To see if the default user is postgres.
    psql -U postgres -x -c "select * from current_user;"
  3. Check the password expiration date. rolvaliduntil no value means no expiration date.
    psql -h 127.0.0.1 -U postgres -d postgres
    SELECT * FROM pg_roles WHERE rolname='postgres';
  4. Try clearing docker’s columns and containers, and restarting docker

[Solved] Docker Download Mirror Error: Cannot connect to the Docker daemon at…

If you want to download an image with docker, you will report an error

cannot connect to the docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

cause analysis: maybe docker did not exit normally last time, so docker did not start normally, and the docker process could not be found in the corresponding /var/run/ path.

Solution:
enter the command

systemctl start docker.service

Then you can download

How to Solve K8S Error getting node

During the installation or operation of k8s cluster, you may encounter problems of "error getting node", such as:

"Error getting node" err="node \"master\" not found"
dial tcp 10.8.126.46:6443: connect: connection refused"
"Error getting node" err="node \"master\" not found"
"Error getting node" err="node \"master\" not found"

The way to troubleshoot such problems is to execute the following commands to check the specific error causes:

journalctl -xeu kubelet

Find the initial error and deal with it according to different errors
according to the problems I have encountered, there are mainly the following possibilities:

  1. No swap memory disabled
  2. There is a problem with hostname setting or hosts setting (other bloggers listed reasons)
  3. The container and k8s version is not compatible (other bloggers listed reasons)

[Solved] Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to

Cluster calico deployed by kuberadm runs incorrectly

report errors:

1, check calico

kubectl get pods -n kube-system

Calico-node 0/1 Running

2. check the log

kubectl describe calico-node-gdv9r -n kube-system

Calico error:
Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory

Solution:

1. Remove the DOWN NIC/interface that Docker and Calico errors.

ip link docker0

2, delete Calico pod (pod will be rebuilt and run by k8s cluster after deletion)

kubectl delete pod calico-node-gdv9r -n kube-system

3, check Calico again and find it is normal</ font>

kubectl get pods -n kube-system

Failed to remove multipath map 320b508ca45022b80 [How to Solve]

Failed to remove multipath map 320b508ca45022b80

1. Project scenario

Host os:kylin-server-10-sp1-release-build02-20210518-arm64
docker:docker-ce-18.09.7
cloud: openstack queens
storage: same acs5000
VM os: kylin-server-10-sp1-release-build02-20210518-arm64


2. Problem description and cause analysis

2.1 problem description

The volume-based virtual machine can be created normally, but the error after restarting the virtual machine, checking the logs of nova-compute, found that it reports ProcessExecutionError:unexpected error while running command. command: multipah -f 320b508ca45022b80 failed, map in use, failed to remove multipath map 320b508ca45022b80.
I manually executed multipah -f 320b508ca45022b80, and it did report the status of in use, so I suspect that there are processes using the volume, and I found that the same volume group name was activated through lvdisplay, vgdisplay and lsblk, so I suspect that the virtual machine and the physical machine used the same volume group name, and the volume group name was activated after the virtual machine started. The VM has been activated, and the process of reactivating all logical volumes in the volume group failed, resulting in multipath -f failure. Therefore, we need to configure lvm to activate only the logical volumes of the system, check the system volumes by lsblk, and then configure accordingly, edit /etc/lvm/lvm.conf and modify the following content

devices {
        filter = [ "a/sda/", "r/.*/" ]
}
allocation {
       volume_list = ["klas"]
       auto_activation_volume_list = ["klas"]
}

Restart service:

systemctl restart lvm2-lvmetad.service lvm2-lvmetad.socket

Re create the virtual machine and restart it. It is also recommended that the virtual machine adopt other volume group names

2.2 storage configuration

2.2.1 drive

Use the same driver version zeus-driver-3.1.2.000106, copy the driver to the cinder_volume container /usr/lib/python2.7/site-packages/cinder/volume/drivers/ directory and the cinder_backup container /usr/lib/python2.7/site-packages/cinder/backup/drivers/ directory, and restart the related services.

2.2.2 configure cinder volume

vim /etc/kolla/cinder-volume/cinder.conf

[DEFAULT]
enabled_backends=toyou_ssd
[toyou_ssd]
volume_driver = cinder.volume.drivers.zeus.Acs5000_iscsi.Acs5000ISCSIDriver
san_ip = x.x.x.x
use_mutipath_for_image_xfer = True
image_volume_cache_enabled = True
san_login = cliuser
san_password = ******
acs5000_volpool_name = toyou_ssd
acs5000_target = 0
volume_backend_name = toyou_ssd

Restart the cinder-volume service. For others, please refer to the “reference scheme”


3. Solutions

View the adopted system disk through lsblk, and then edit /etc/lvm/lvm.conf to modify the following contents

devices {
        filter = [ "a/sda/", "r/.*/" ]
}
allocation {
       volume_list = ["klas"]
       auto_activation_volume_list = ["klas"]
}

Restart service:

systemctl restart lvm2-lvmetad.service lvm2-lvmetad.socket

Note that it is mainly the filter. The drive letter in the filter is determined by the system disk recognized by lsblk, which may be SDB or nvme, etc

[Solved] onlyoffice Error: error self signed certificate and download failed

When Installing nextcloud+onlyoffice, onlyoffice failed to start and report an error:


enter the container to see the error information of out.log

[root@nextcloud ~]# docker ps -a
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
CONTAINER ID  IMAGE                                       COMMAND     CREATED       STATUS           PORTS                                        NAMES
a7c97fb93556  docker.io/onlyoffice/documentserver:latest              30 hours ago  Up 30 hours ago  0.0.0.0:8080->80/tcp, 0.0.0.0:9000->443/tcp  onlyoffice
[root@nextcloud ~]# docker exec -it a7c97fb93556 /bin/bash
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
root@a7c97fb93556:/# cd /var/log/onlyoffice/documentserver/converter/
root@a7c97fb93556:/var/log/onlyoffice/documentserver/converter# ls
err.log  out.log-20220729
root@a7c97fb93556:/var/log/onlyoffice/documentserver/converter#

Disabling Document Server Access Authentication
Next, disable access authentication for Document Server, which by default rejects unauthenticated requests (i.e., self-signed HTTPS requests).

I am now running Document Server with Docker, using the docker exec command to log into the container.

There seems to be only the nano editor in the container, but that’s enough.

Open /etc/onlyoffice/documentserver/default.json, go down and find the rejectUnauthorized field and change its value to false.

Restart the container.
Modify default.json

root@a7c97fb93556:/var/log/onlyoffice/documentserver/converter# cd /etc/onlyoffice/
root@a7c97fb93556:/etc/onlyoffice# ls
documentserver  documentserver-example
root@a7c97fb93556:/etc/onlyoffice# cd documentserver
root@a7c97fb93556:/etc/onlyoffice/documentserver# ls
default.json              local.json  production-linux.json
development-linux.json    log4js      production-windows.json
development-mac.json      logrotate   supervisor
development-windows.json  nginx
root@a7c97fb93556:/etc/onlyoffice/documentserver# pwd
/etc/onlyoffice/documentserver
root@a7c97fb93556:/etc/onlyoffice/documentserver#

Modify as follows: “rejectunauthorized”: false

                     "requestDefaults": {
                                "headers": {
                                        "User-Agent": "Node.js/6.13",
                                        "Connection": "Keep-Alive"
                                },
                                "gzip": true,
                                "rejectUnauthorized": false
                        },

Restart container

root@a7c97fb93556:/etc/onlyoffice/documentserver# exit
exit
[root@nextcloud ~]# docker stop a7c97fb93556
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Error: given PIDs did not die within timeout
[root@nextcloud ~]# docker start a7c97fb93556
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
Error: unable to start container "a7c97fb93556650c83dd763f9578705a82f34b2673f9759e8d0ce62afc63e77c": container a7c97fb93556650c83dd763f9578705a82f34b2673f9759e8d0ce62afc63e77c must be in Created or Stopped state to be started: container state improper
[root@nextcloud ~]# reboot

Restart nextcloud

login as: root
[email protected]'s password:
Activate the web console with: systemctl enable --now cockpit.socket

Last login: Fri Jul 29 15:59:59 2022 from 192.168.182.1
[root@nextcloud ~]# setenforce 0
[root@nextcloud ~]# systemctl start https
Failed to start https.service: Unit https.service not found.
[root@nextcloud ~]# systemctl start httpd
Enter TLS private key passphrase for localhost:443 (RSA) : ******
[root@nextcloud ~]# docker ps -a
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
CONTAINER ID  IMAGE                                       COMMAND     CREATED       STATUS      PORTS                                        NAMES
a7c97fb93556  docker.io/onlyoffice/documentserver:latest              31 hours ago  Created     0.0.0.0:8080->80/tcp, 0.0.0.0:9000->443/tcp  onlyoffice
[root@nextcloud ~]# docker start a7c97fb93556
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
a7c97fb93556
[root@nextcloud ~]#

Start onlyoffice

run as prompted

[root@nextcloud ~]# sudo docker exec a7c97fb93556 sudo supervisorctl start ds:example
Emulate Docker CLI using podman. Create /etc/containers/nodocker to quiet msg.
sudo: unable to send audit message: Operation not permitted
ds:example: started

Successfully opened word document

[Solved] Worker 1 failed executing transaction ‘ANONYMOUS‘ at master log mall-mysql-bin.000001, end_log_pos

A problem encountered while configuring MySQL master-slave server in Docker.

The following error:

Coordinator stopped because there were error(s) in the worker(s). The most recent failure being: Worker 1 failed executing transaction ‘ANONYMOUS’ at master log mall-mysql-bin.000001, end_log_pos 2251. See error log and/or performance_schema.replication_applier_status_by_worker table for more details about this failure or others, if any.

Based on the hints given in the error message, execute in the mysql client to view the detailed error message.

select * from performance_schema.replication_applier_status_by_worker;

Worker 1 failed executing transaction ‘ANONYMOUS’ at master log
mall-mysql-bin.000001, end_log_pos 889; Error ‘Can’t create database
‘t1’; database exists’ on query. Default database: ‘t1’. Query:
‘create database t1’


Reasons:
1. The password policy problem of MySQL8, change the configuration file and use the policy of the previous version.
Execute these two commands in MySQL host client.

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'root';
ALTER USER 'slave'@'%' IDENTIFIED WITH mysql_native_password BY 'root';

Add a line to my.cnf that aligns MySQL8 with MySQL5.7 password authentication.

default_authentication_plugin=mysql_native_password

After the master and slave have changed this configuration, restart the master and slave.
(Each line of configuration in my.cnf file must remember to check if there are spaces at the end of the line. If there are spaces, delete them.)

docker restart mysql-master(Your own mysql host container name)

docker ps

docker restart mysql-slave(Your own mysql slave container name)

docker ps

2. My understanding is that the table already exists does not mean that your slave already exists. It means that the table already exists on the host before you configure the slave, so this problem will be reported.

Execute the following command on the slave MySQL client.

stop slave;

reset master;

Go to MySQL master and delete the database added by your own test.

drop database Add the database for your own testing;

show master status;

According to the values of File and Position of mysql-master, change the master_log_file and master_log_pos of the following command.

change master to master_host=‘192.168.159.200’, master_user=‘slave’,
master_password=‘root’, master_port=3307,
master_log_file=‘mall-mysql-bin.000002’, master_log_pos=331,
master_connect_retry=30;

After the change, execute this command on mysql-slave;

start slave;

show slave status\G

If you find that both Slave_IO_Running and Slave_SQL_Running show Yes, the MySQL master-slave configuration is successful.

As long as one of them is not Yes, it is something like Connecting or No, it means that the configuration is not successful.

After configuring the master-slave, create a new database and table on mysql-master, insert the data, and then go to the slave to verify that the data is synchronized over.

mysql-master


mysql-slave

So far, the installation of MySQL master-slave in docker is completed.

[Solved] docker skywalking error: no provider found for module storage

When I use docker to deploy skywalking, I always report an error: no provider found for module storage

Details are as follows:

Conditions:

  1. skywalking 9.1
  2. elasticsearch 7

Execute command:

docker run --name skywalking-oap --restart always -d \
-p 12800:12800 \
-p 11800:11800 \
--link es7:es7 \
-e SW_STORAGE=elasticsearch7 \
-e SW_STORAGE_ES_CLUSTER_NODES=es7:9200 \
skywalking-oap-server

report errors:

no provider found for module storage

Solution:

Modify

docker run --name skywalking-oap --restart always -d \
-p 12800:12800 \
-p 11800:11800 \
--link es7:es7 \
-e SW_STORAGE=elasticsearch7 \
-e SW_STORAGE_ES_CLUSTER_NODES=es7:9200 \
skywalking-oap-server

to

docker run --name skywalking-oap --restart always -d \
-p 12800:12800 \
-p 11800:11800 \
--link es7:es7 \
-e SW_STORAGE=elasticsearch \
-e SW_STORAGE_ES_CLUSTER_NODES=es7:9200 \
skywalking-oap-server

Modify sw_ Storage=elasticsearch7 to sw_ STORAGE=elasticsearch

Cause analysis:

  1. Before skywalking 8.8, it cannot automatically sense what the storage source is, and you need to manually specify whether it is es6 or 7;
  2. After 8.8, the version of the storage source can be automatically sensed, and there is no need to manually specify es6 or 7, just write es directly;