Tag Archives: Operation and maintenance

MAC: How to modify the docker container error [screen is terminating]

Run in MAC:

cd /Users/xq/Library/Containers/com.docker.docker/Data/vms/0
screen tty

The following message appears: [screen is terminating]

resolvent

Step 1: pull the secondary image

Direct run command:

docker run -it --privileged --pid=host justincormack/nsenter1

This command will pull down an image: justincommack/nsenter1 latest c81481184b1b 3 years ago 101kb

After pulling, the image will be entered

Step 2: position the container

Run in the command line of the container corresponding to this image:

cd /var/lib/docker/containers

This is all the containers. The file name is the corresponding ID

Step 3: modify the container’s file

First, check the docker ID to be modified. You can use it on the command line: docker PS - a , and then use it on the command line of justincormack/nsenter1:

cd The Container ID you want to change/

Here you can modify the container file, and the modified results will be applied to the docker container.

How to Solve Nginx 413 Error (request entity too large)

Solve the error of Nginx 413 (request entity too large)

Error reporting reason:
the request body is too large. The default upload file size in the nginx configuration file is 1m. You need to modify the upload file size configuration in the configuration file
in the Nginx directory,
find the conf folder,
open the Nginx.conf file
add the following code to HTTP {…}

http{
    
    #upload the file size
    client_max_body_size 1024m;
    
}

After modifying the configuration file, restart Nginx
Restart command: Nginx – s reload

Error reporting using NVIDIA SMI

Error reporting using NVIDIA SMI:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

This is a common problem, which often occurs in Ubuntu systems. The main reason is that the system kernel has been upgraded, resulting in the mismatch between the new kernel and the original graphics card driver

Solution 1:

Just execute two commands:

Sudo apt get install DKMS
sudo DKMS install – M NVIDIA – V 440.44 (440.44 indicates the driver version number)

Use the command ll/usr/SRC/to view the nvidia-440.44/ folder below. The version number varies from computer to computer
 

Error while loading error while loading shared libraries [How to Solve]

Error while loading error while loading shared libraries solution

If it’s missing, find it and put it back

Distribution: archlinux
as stated in the title, take the yaourt encountered before writing this article as an example:

➜  ~ yaourt -Ss uswsusp
yaourt: error while loading shared libraries: libalpm.so.12

After looking at the in the system, I found that libalpm. So has been upgraded to 13

➜  ~ ls -l /usr/lib/libalpm.so*
lrwxrwxrwx 1 root root     13 Jul 16 03:26 /usr/lib/libalpm.so -> libalpm.so.13
lrwxrwxrwx 1 root root     17 Jul 16 03:26 /usr/lib/libalpm.so.13 -> libalpm.so.13.0.0
-rwxr-xr-x 1 root root 243608 Jul 16 03:26 /usr/lib/libalpm.so.13.0.0

Well, the problem is very simple. The library has been upgraded, but the dependent software developers have not upgraded.

Next, only three steps are required:
confirm the software from libalpm. So and download the old version (which may contain 12 generally open the package. You can directly find the corresponding link library and copy it to under /usr/Lib My detailed steps:
the search discovery may be in the Pacman package

➜  ~ pacman -Ss libalpm
core/pacman 6.0.0-5 (base-devel) [installed]
    A library-based package manager with dependency support
extra/pyalpm 0.10.6-1
    Python 3 bindings for libalpm
(..... Other insignificant packages)

To download the old software package of archlinux, you need to find it in arch archive.

In the /packages/P/Pacman/ directory, I tried to download the previous version of the current version (v6.0.0), pacman-5.2.2-4-x86_ 64.pkg.tar.zst

Open it directly and find libalpm. So. 12

finally, copy the extracted libalpm. So. * to /usr/lib ( Be careful not to copy the one without version suffix ( libalpm. So )

Mgr [error]: library file libdmhs_exec.so not found, error: 0

DM8 reports an error when building DMHS
Mgr [error]: library file libdmhs_ Exec.so not found, error: 0

DMHS> start exec
CSL[ERROR]: Failed to load execution module

view log

MGR[INFO]: Loading execution module...
MGR[ERROR]: Library file libdmhs_exec.so not found, Error: 0
MGR[ERROR]: Log execution failed to start

Solution:
LDD tracking library file

[[email protected] bin]$ ldd libdmhs_exec.so
	linux-vdso.so.1 =>  (0x00007ffe819f6000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f3897ed8000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f3897bd5000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f38979cd000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f38977b1000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f38975ac000)
	libdmhs_pub.so => ./libdmhs_pub.so (0x00007f3897280000)
	libdmhs_ucvt.so => ./libdmhs_ucvt.so (0x00007f3896d25000)
	libdmhs_dm_obj.so => ./libdmhs_dm_obj.so (0x00007f3896b0f000)
	libdmhs_cvt.so => ./libdmhs_cvt.so (0x00007f3896886000)
	libdodbc.so => /opt/dmdbms/bin/libdodbc.so (0x00007f389665a000)
	/lib64/ld-linux-x86-64.so.2 (0x0000563b938c3000)
	libdmhs_exp.so => ./libdmhs_exp.so (0x00007f3896284000)
	libdmhs_xml.so => ./libdmhs_xml.so (0x00007f3896075000)
	libdmoci.so => not found
	libdmdpi.so => /opt/dmdbms/bin/libdmdpi.so (0x00007f3895406000)
	libdmfldr.so => /opt/dmdbms/bin/libdmfldr.so (0x00007f38947d5000)
	libdmelog.so => /opt/dmdbms/bin/libdmelog.so (0x00007f38945ce000)
	libdmutl.so => /opt/dmdbms/bin/libdmutl.so (0x00007f38943bc000)
	libdmclientlex.so => /opt/dmdbms/bin/libdmclientlex.so (0x00007f3894189000)
	libdmos.so => /opt/dmdbms/bin/libdmos.so (0x00007f3893f5c000)
	libdmcvt.so => /opt/dmdbms/bin/libdmcvt.so (0x00007f389387d000)
	libdmstrt.so => /opt/dmdbms/bin/libdmstrt.so (0x00007f3893669000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f3893360000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f389314a000)
	libdmmem.so => /opt/dmdbms/bin/libdmmem.so (0x00007f3892f3c000)
	libdmcalc.so => /opt/dmdbms/bin/libdmcalc.so (0x00007f3892cb7000)

Libdmoci.so not found

[[email protected] bin]$ pwd
/home/dmdba/dmhs/bin
[[email protected] bin]$ find -name libdmoci.so
./stat/libdmoci.so
[[email protected] bin]$ cat /home/dmdba/.bash_profile 
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/opt/dmdbms/bin"
export DM_HOME="/opt/dmdbms"
export DMHS_HOME=/home/dmdba/dmhs
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/dmdba/dmhs/bin
[[email protected] bin]$ cp ./stat/libdmoci.so ./

Find and find it in/home/dmdba/DMHS/bin/STAT/libdmoci.so, but LD_ LIBRARY_ Path is not specified in this directory. Copy libdmoci.so to LD_ LIBRARY_ Under the directory specified by path, or add LD_ LIBRARY_ The path points to/home/dmdba/DMHS/bin/STAT
(if find is not found, you need to download one online)

Try LDD it again

[[email protected] bin]$ ldd libdmhs_exec.so
	linux-vdso.so.1 =>  (0x00007ffcf27be000)
	libc.so.6 => /lib64/libc.so.6 (0x00007f8c25479000)
	libm.so.6 => /lib64/libm.so.6 (0x00007f8c25176000)
	librt.so.1 => /lib64/librt.so.1 (0x00007f8c24f6e000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f8c24d52000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007f8c24b4d000)
	libdmhs_pub.so => ./libdmhs_pub.so (0x00007f8c24821000)
	libdmhs_ucvt.so => ./libdmhs_ucvt.so (0x00007f8c242c6000)
	libdmhs_dm_obj.so => ./libdmhs_dm_obj.so (0x00007f8c240b0000)
	libdmhs_cvt.so => ./libdmhs_cvt.so (0x00007f8c23e27000)
	libdodbc.so => /opt/dmdbms/bin/libdodbc.so (0x00007f8c23bfb000)
	/lib64/ld-linux-x86-64.so.2 (0x00005591fe1ef000)
	libdmhs_exp.so => ./libdmhs_exp.so (0x00007f8c23825000)
	libdmhs_xml.so => ./libdmhs_xml.so (0x00007f8c23616000)
	libdmoci.so => ./libdmoci.so (0x00007f8c22ba6000)
	libdmdpi.so => /opt/dmdbms/bin/libdmdpi.so (0x00007f8c21f38000)
	libdmfldr.so => /opt/dmdbms/bin/libdmfldr.so (0x00007f8c21307000)
	libdmelog.so => /opt/dmdbms/bin/libdmelog.so (0x00007f8c21100000)
	libdmutl.so => /opt/dmdbms/bin/libdmutl.so (0x00007f8c20eee000)
	libdmclientlex.so => /opt/dmdbms/bin/libdmclientlex.so (0x00007f8c20cbb000)
	libdmos.so => /opt/dmdbms/bin/libdmos.so (0x00007f8c20a8e000)
	libdmcvt.so => /opt/dmdbms/bin/libdmcvt.so (0x00007f8c203af000)
	libdmstrt.so => /opt/dmdbms/bin/libdmstrt.so (0x00007f8c2019b000)
	libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f8c1fe92000)
	libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f8c1fc7c000)
	libdmmem.so => /opt/dmdbms/bin/libdmmem.so (0x00007f8c1fa6e000)
	libdmcalc.so => /opt/dmdbms/bin/libdmcalc.so (0x00007f8c1f7e9000)

When it is normal, restart DMHS and try to start exec again

Redis: (error) MOVED 8352 192.168.145.128:6380 [How to Solve]

Problem Description:

Redis set, get and other operations have the following errors

 (error) MOVED 8352 192.168.145.128:6380

Cause analysis:

This is generally caused by the fact that the cluster mode is not set when redis cli is started; After starting the cluster, redis cli logs in as an ordinary user and cannot operate the data in the cluster. You need to add – C to log in as a cluster mode before you can operate.

Solution:

Add – C at startup to start cluster mode

redis-cli -c -p 6379

As shown in the figure below, the operation can be successfully performed

Summary

Login in normal mode: you may directly enter the read host and move redirection will occur when storing data. Therefore, you should log in as a cluster. Add the – C parameter to connect with the cluster policy, and the setting data will be automatically switched to the corresponding write host

Parsing error name or service not known [How to Solve]

Problem phenomenon

On the node7 node of Alibaba OCP cluster, a domain name cannot be resolved when it is resolved. Error message: name or service not known

Troubleshooting

After testing, it is found that this problem does not only occur in node7 nodes. In all servers in Alibaba cloud East China 2 (Shanghai) zone F, the domain name cannot be resolved (other zones are normal).

Conclusion

After confirming with ALI engineers, the problem is caused by the fact that the self built DNS authoritative server that resolves the domain name does not support EDNS. The DNS community requires that the authoritative server must support EDNS, otherwise the localdns does not have a work around mechanism. However, due to different versions of alicloud’s localdns, it has not been completely upgraded. Therefore, some regions (availability zone f) comply with this Convention and cannot be parsed, while some regions are compatible with this workaround and can be parsed

Solution

(1) The other side creates its own authoritative DNS and turns on EDNS
(2) modify the resolver of ECs to 223.5.5.5 and 223.6.6. The two DNS have not removed the workaround of ends

[Solved] Docker+uWSGI+Flask Error: ModuleNotFoundError: No module named ‘flask‘

Background

The docker + nginx + uwsgi + flask deployment environment has always been able to run well before. This time, the Python version of the basic image was upgraded from the original 3.6 to 3.8 , and the title was wrong.

docker + nginx + uwsgi + flask deployment can refer to this article

Problem analysis

Let’s take a look at the start log of docker:

Starting nginx: nginx.,
*** Starting uWSGI 2.0.18-debian (64bit) on [Tue Aug 17 02:21:46 2021] ***,
[uWSGI] getting INI configuration from uwsgi.ini,
compiled with version: 8.2.0 on 10 February 2019 02:42:46,
os: Linux-3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018,
nodename: 9c8cc3ffd4ed,
machine: x86_64,
pcre jit disabled,
detected number of CPU cores: 2,
clock source: unix,
current working directory: /code,
detected binary path: /usr/bin/uwsgi-core,
uWSGI running as root, you can use --uid/--gid/--chroot options,
chdir() to /code,
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** ,
*** WARNING: you are running uWSGI without its master process manager ***,
your memory page size is 4096 bytes,
detected max file descriptor number: 1048576,
lock engine: pthread robust mutexes,
thunder lock: disabled (you can enable it with --thunder-lock),
uwsgi socket 0 bound to TCP address :5000 fd 3,
uWSGI running as root, you can use --uid/--gid/--chroot options,
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** ,
Python version: 3.7.3 (default, Jan 22 2021, 20:04:44)  [GCC 8.3.0],
Python main interpreter initialized at 0x55fa4f5a8990,
uWSGI running as root, you can use --uid/--gid/--chroot options,
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** ,
python threads support enabled,
your server socket listen backlog is limited to 100 connections,
your mercy for graceful operations on workers is 60 seconds,
mapped 825016 bytes (805 KB) for 8 cores,
*** Operational MODE: preforking+threaded ***,
Traceback (most recent call last):,
  File "run.py", line 20, in <module>,
    from server import create_app,
  File "./server/__init__.py", line 14, in <module>,
    from flask import Flask,
unable to load app 0 (mountpoint='') (callable not found or import error),
*** no app loaded. going in full dynamic mode ***,
uWSGI running as root, you can use --uid/--gid/--chroot options,
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) *** ,
*** uWSGI is running in multiple interpreter mode ***,
spawned uWSGI worker 1 (pid: 21, cores: 2),
spawned uWSGI worker 2 (pid: 22, cores: 2),
spawned uWSGI worker 3 (pid: 23, cores: 2),
spawned uWSGI worker 4 (pid: 24, cores: 2)

It can be seen from this that an error is reported when quoting flash .

What causes this problem?

If the program does not find the python Library (flash) when running, it will report an error.

Here, we don’t care why he didn’t find it. Since he didn’t find it, we’ll take the initiative to tell him where to find it.

Solution

Modify uwsgi. Ini , and set the value of Python path to /usr/local/lib/Python 3.8/site packages/.

The content of the modified uwsgi. Ini file is:

[uwsgi]
chdir = /code
socket = :5000
pythonpath = /usr/local/lib/python3.8/site-packages/
wsgi-file = run.py
callable = app
chmod-socket = 666
plugins = python3
buffer-size = 65535
processes = 4
threads = 2

All right, problem solved.

[Solved] K8s cluster build error: error: kubectl get csr No resources found.

K8s cluster setup error: error: kubectl get CSR no resources found

Problem cause and solution test successful

problem

kubectl get csr
No resources found.

reason

because the original SSL certificate is invalid after restart, if it is not deleted, kubelet cannot communicate with the master even after restart 

Solution:

cd /opt/kubernetes/ssl
ls
kubelet-client-2021-04-14-08-41-36.pem  kubelet-client-current.pem  kubelet.crt  kubelet.key
# Delete all certificates 
rm -rf *
# close or open the kubelet
systemctl stop kubelet

master01

kubectl delete clusterrolebinding kubelet-bootstrap
clusterrolebinding.rbac.authorization.k8s.io "kubelet-bootstrap" deleted

kubectl create clusterrolebinding kubelet-bootstrap --clusterrole=system:node-bootstrapper --user=kubelet-bootstrap
clusterrolebinding.rbac.authorization.k8s.io/kubelet-bootstrap created

node 

#open kubelet
#node01
bash kubelet.sh 192.168.238.82
#node02
bash kubelet.sh 192.168.238.83

Test successful

master01

kubectl get csr
NAME                                                   AGE   REQUESTOR           CONDITION
node-csr-mJwuqA7DAf4UmB1InN_WEYhFWbQKOqUVXg9Bvc7Intk   4s    kubelet-bootstrap   Pending
node-csr-ydhzi9EG9M_Ozmbvep0ledwhTCanppStZoq7vuooTq8   11s   kubelet-bootstrap   Pending

Done!!!

How to solve the resource temporarily unavailable error of the Linux host?

Reason: the current user has limited the number of processes
solution:
Su root
(if the switch fails due to resource temporarily unavailable, you can log in with another user)
CD/etc/security/limits. D
VI 90 nproc. Conf
add a new line of current user with unlimited number of processes
note: you need to log in again after modification

Communication link failure when connecting Doris

Springboot queries Doris with an error

ERROR [http-nio-10020-exec-12] [http-nio-10020-exec-12raceId] [] [5] @@[email protected]@ | server error 
org.springframework.dao.RecoverableDataAccessException: 
### Error querying database.  Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 426 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.
; Communications link failure

The last packet successfully received from the server was 426 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.; nested exception is com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 426 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.

An error is reported in the insert into select task scheduled by Doris

ERROR 2013 (HY000) at line 7: Lost connection to MySQL server during query

analysis

It may be that slow queries cause huge pressure on the cluster.
several slow queries reach 120s-400s, which is unbearable for the Doris cluster because of the global query_ The timeout parameter is 60. It is assumed that the task session variable of someone is set to 600s or higher

Let the development offline slow query task and the tuning SQL
slow query task for more than 100 seconds work normally after offline

But after a while, the springboot service alarms. There are mistakes again

Doris parameter

interactive_timeout=3880000

wait_timeout=3880000

Doris Fe service node alarm log

2021-06-03 16:00:08,398 WARN (Connect-Scheduler-Check-Timer-0|79) [ConnectContext.checkTimeout():365] kill wait timeout connection, remote: 1.1.1.1:57399, wait timeout: 3880000
2021-06-03 16:00:08,398 WARN (Connect-Scheduler-Check-Timer-0|79) [ConnectContext.kill():339] kill timeout query, 1.1.1.1.1:57399, kill connection: true

Doris monitoring

It can be seen that the number of connections at 15:44 drops sharply

#Elk log
you can also see that the alarm and error messages of Doris queried by springboot service also start at 15:44
so what operation variables affect the cluster at 15:44?

See waite according to the error report
_ The time is 3880000s, which is 44 days, but the default in the source code is 28800s

interactive_timeout=3880000

wait_timeout=3880000

No one went online, no one cut, and the Cluster Administrator was in my hands. I didn’t change the parameters, but I’m still not sure why the parameters will change. Go to the fe.audit audit audit log to check the operation records. Sure enough,
someone ( insider ) was using the 2020.2.3 version of DataGrid. At 15:44, the set global parameters were modified

interactive_timeout=3880000

wait_timeout=3880000

call back the two parameters to 28800s , and the connections of the cluster are restored immediately
it should be noted here that in the discussion with the community, there is only wait in Doris_ Timeout works, and the other is interactive_ Timeout in order to be compatible with MySQL, it doesn’t work

Question: why wait in Doris_ When the timeout parameter is too large, it will cause a connection error communications link failure
on the contrary, it can return to normal after being reduced. You need to sort out the code and look at the logic

Please check the
connection Doris error communications link failure