Tag Archives: Operation and maintenance

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[ERROR SystemVerification]: failed to parse kernel config: unable to load kernel module: “configs“

[error systemverification]: failed to parse kernel config: unable to load kernel module: “configs”.

When installing kubernetes cluster, the above error is reported.

 

Solution:

Method 1: ignore the error

Add the –ignore-preflight-errors=SystemVerification option to ignore the error. It is not possible to tell if other problems will occur subsequently with this option.

Method 2: Upgrade kernel version

I installed the kubernetes cluster using kernel version 4.19.12, and the problem did not occur after upgrading the kernel to 5.13.7. I am not sure if it is a kernel version problem.

Method 3:

Manually compile the config kernel module

 

Failed to Initialize Error: error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Port-6443]

[[email protected] ~]# kubeadm init --config=kubeadm-config.yaml --experimental-upload-certs | tee kubeadm-init.log
Flag --experimental-upload-certs has been deprecated, use --upload-certs instead
[init] Using Kubernetes version: v1.15.1
[preflight] Running pre-flight checks
    [WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.11. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR Port-6443]: Port 6443 is in use
    [ERROR Port-10251]: Port 10251 is in use
    [ERROR Port-10252]: Port 10252 is in use
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR Port-2379]: Port 2379 is in use
    [ERROR Port-2380]: Port 2380 is in use
    [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
Reason:
Restart kubeadm after modifying the kubeadm-config.yaml file, otherwise the port from the previous startup is occupied.
Solution:
Result test:
The k8s cluster was initialized successfully.
[[email protected] ~]# kubeadm init –config kubeadm-config.yaml –ignore-preflight-errors=SystemVerific

 

[Solved] prometheus Startup Error: opening storage failed

Modify prometheus.yml file and failed to start:

1. Configuring prometheus + node_exporter monitoring, the solution to the failure to start after modifying the prometheus.yml file.
2. Error message 1: err=”opening storage failed: lock DB directory: resource temporarily unavailable”

Solution: Check whether the current directory has generated files plus data/lock, need to delete the lock file: rm -rf lock
Delete and restart again

error message 2:err= “error starting web server: listen TCP 0.0.0:9090: bind: address ready in use”
installation command: yum install lsof -y
view command: lsof -i:9090
end command: kill -9 2878
restart command: ./prometheus
restart succeeded after operation

 

[Solved] ssh secure shell: server responded algorithm negotiation failed

ssh secure shell:server responded algorithm negotiation failed

This problem is usually solved as follows:
enter
1. cd /etc/ssh
2.vim /etc/ssh/sshd_config

# Add the following to the configuration file (except the last one of the third is gray plus purple, the others are blue, if it is gray means it is not right!)
Ciphers aes128-cbc,aes192-cbc,aes256-cbc,aes128-ctr,aes192-ctr,aes256-ctr,3des-cbc,arcfour128,arcfour256,arcfour,blowfish-cbc,cast128-cbc
 
MACs hmac-md5,hmac-sha1,[email protected],hmac-ripemd160,hmac-sha1-96,hmac-md5-96
 
KexAlgorithms diffie-hellman-group1-sha1,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha1,diffie-hellman-group-exchange-sha256,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group1-sha1,[email protected]

Then restart sshd with this statement and you’re done
systemctl restart sshd

Because the error page can not be selected, so I did not directly check the error, but with their own understanding of the error to check, but wasted a lot of time, so if such an error, or hand typing in the future, it is very fast.

Error: ENOSPC: no space left on device [How to Solve]

In case of the above error, generally speaking, the server cannot create the file. At this time, we can find the problem from two directions

1. The disk is full of blocks or inodes

1. The disk block is full. Check the command df -h

[[email protected] ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/vda3        35G   28G  5.6G  83% /
tmpfs           504M     0  504M   0% /dev/shm
/dev/vda1       194M   47M  138M  26% /boot
/dev/vdb1       325G  118G  192G  38% /home/wwwroot/vdb1data

2. Disk inode is full. Check the command df -i

[[email protected] ~]# df -i
Filesystem       Inodes    IUsed   IFree IUse% Mounted on
/dev/vda3       2289280  1628394  660886   72% /
tmpfs            128827        1  128826    1% /dev/shm
/dev/vda1         51200       44   51156    1% /boot
/dev/vdb1      21626880 21626880       0  100% /home/wwwroot/vdb1data

We found after comparison that the disk block occupied 38%, but the inode occupied 100%, it can be imagined that the disk fragmentation of small files are particularly large, we can go to the corresponding disk under the deletion of useless small files to solve the problem; we have to keep the following two ideas, of course, to solve the fundamental problem also need to buy mount more disks to solve;

Idea one: inode is full: delete useless small files as much as possible to release enough inode

Idea two: block full: delete as many useless large files as possible to free up enough blocks

 

2. Error: ENOSPC: no space left on device, watch

node project reactnative Error: Error: ENOSPC: no space left on device, watch

[[email protected] JFReactNativeProject]# npm start
 
> [email protected] start /app/jenkins_workspace/workspace/JFReactNativeProject
> react-native start
 
┌──────────────────────────────────────────────────────────────────────────────┐
│                                                                              │
│  Running Metro Bundler on port 8081.                                         │
│                                                                              │
│  Keep Metro running while developing on any JS projects. Feel free to        │
│  close this tab and run your own Metro instance if you prefer.               │
│                                                                              │
│  https://github.com/facebook/react-native                                    │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘
 
Looking for JS files in
   /app/jenkins_workspace/workspace/JFReactNativeProject
 
Loading dependency graph...fs.js:1413
    throw error;
    ^
 
Error: ENOSPC: no space left on device, watch '/app/jenkins_workspace/workspace/JFReactNativeProject/node_modules/.staging/react-native-ddd311e5/ReactAndroid/src/androidTest/java/com/facebook/react/testing/idledetection'
    at FSWatcher.start (fs.js:1407:26)
    at Object.fs.watch (fs.js:1444:11)
    at NodeWatcher.watchdir (/app/jenkins_workspace/workspace/JFReactNativeProject/node_modules/[email protected]@sane/src/node_watcher.js:159:22)
    at Walker.<anonymous> (/app/jenkins_workspace/workspace/JFReactNativeProject/node_modules/[email protected]@sane/src/common.js:109:31)
    at Walker.emit (events.js:182:13)
    at /app/jenkins_workspace/workspace/JFReactNativeProject/node_modules/[email protected]@walker/lib/walker.js:69:16
    at go$readdir$cb (/app/jenkins_workspace/workspace/JFReactNativeProject/node_modules/[email protected]@graceful-fs/graceful-fs.js:187:14)
    at FSReqWrap.oncomplete (fs.js:169:20)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `react-native start`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
 
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2019-09-25T06_57_58_754Z-debug.log

Solution:

Enospc means error no more hard disk space available

First, use df -hTto find that there is still a lot of disk space

Then find FSWatcher and Object.fs.watch field, and then view the contents related to the number of files that the system allows users to listen to

#Indicates the number of watches that can be added by the same user at the same time (watches are generally directory-specific and determine the number of directories that can be monitored by the same user at the same time)
[[email protected] JFReactNativeProject]# cat /proc/sys/fs/inotify/max_user_watches
8192
[[email protected] JFReactNativeProject]# echo 100000 > /proc/sys/fs/inotify/max_user_watches
[[email protected] JFReactNativeProject]# cat /proc/sys/fs/inotify/max_user_watches
100000

The permanent effective method is as follows: (this method is recommended)

vim /etc/sysctl.conf
fs.inotify.max_user_watches = 100000(The latter value can be adjusted according to the actual situation)
Just add and run /sbin/sysctl -p 

Start validation:

Restart, normal

[Solved] AgilePLM error: Ora-00904: “A”.”ITEM_NUMBER”: invalid identifier

AgilePLM error

Problem phenomenon

Agile version: 9.3.2

When searching for a material in full-text search, the error is report: ora-00904: “A”.”ITEM_NUMBER”:invalid identifier

However, no error will be reported when using advanced search to specify material search. Restart the database, restart the application and rebuild the index can not be solved.

Cause analysis

In the troubleshooting process, advanced search was used to search all objects one by one. Finally, it was found that the separate search price times was wrong. Later, the administrator remembered that he had changed a price standard in the morning and used the material code as the price standard. The problem may be caused by this operation.

Problem-solving

If the standard is changed, the material code cannot be used as the price standard. Then restart the agile application server. To solve the problem.

[Solved] shell Error: Syntax error: “(“ unexpected (expecting “}“)

The hard disk is damaged and the system is reinstalled. An error is reported when executing the previous script

Syntax error: “(” unexpected (expecting “}”)

Troubleshooting:

ls -l /bin/sh

The default link is dash

Knowledge supplement

Bash: Unix shell written for GNU Project

SH: equivalent to /bin/bash –posix. It is bash that opens POSIX standard

Dash: it has faster execution speed than bash, but supports fewer statement leaves

Solution:

Here, I have no requirements for the speed of script execution, only that it can be used, so I can change it to bash

cd /bin/; ln -sf bash /bin/sh

l

Problem-solving.

[Solved] SecureCRT Connect Error: The server has disconnected with an error…..

1. Problem environment
virtual machine: VMware 16
Windows system: Windows 10
linux system: CentOS 7.6
interactive software: SecureCRT 8.7
2. Prompt

3. Solution
click “options”, select “session options”, as shown in the figure below, click “SSH2”, then modify “username”, and click “OK”

4. reconnect

connect successfully

[Solved] Mysql Build Error: [ERROR] Slave I/O for channel ‘‘: error connecting to master

Project scenario:

mysql5. 7 build dual master replication:
host a: 192.168.218.62:3306 production library
host B: 192.168.218.95:3307 create a new empty library
operating system: centos7

Problem description

tip: describe the problems encountered in the project here:
start slave on host B:
change master to master_ host=‘192.168.218.62’,master_ port=3306,master_ user=‘repl’,master_ password=‘*****’,master_ log_ file=‘mysql-bin. 000017’,master_ log_ pos=****;
start slave replication is normal;

Enable slave on host a:
change master to master_ host=‘192.168.218.95’,master_ port=3307,master_ user=‘repl’,master_ password=‘*****’,master_ log_ file=‘mysql-bin. 1234’,master_ log_ pos=****;
after start save, show slave status reports the following error:
error connecting to master‘ [email protected] : 3307 ‘- retry time: 60 retries: 6
check that there are no other meaningful logs under/var/log/messages

Cause analysis:

Troubleshooting route:
1 Log in to host B MySQL on host a: MySQL – U repl – p ‘*****’ – P 3307 – H 192.168.218.95 normal – eliminate account and password errors
2. Check the permission of the replication account on host B, show grants for repl @ ‘%’ and find that it has the permission of replication slave and replication client – eliminate the permission problem
I tried to create a new account, but it didn’t work to restart the service.
finally, it is found that SELinux is not related
check SELinux status:

 [[email protected] ~]# getenforce
Enforcing(means selinux does not close)

Solution:

Close SELinux:
I Temporary shutdown
enter the command setenforce = 0 (it will be invalid after restarting the machine)
check SELinux status:

[[email protected] ~]# getenforce
	Permissive(means close successfully)

II Permanently close
open the /etc/selinux/config file and modify SELINUX=DISABLED (the server needs to be restarted to take effect);

re-change master and then restart to copy successfully

[Solved]unpacking of archive failed: cpio: lstat failed – Not a directory

A strange CPIO error was reported when installing the RPM package on the company server today. (most errors can be solved by reinstalling CPIO or downloading RPM package again)

unpacking of archive failed: cpio: lstat failed - Not a directory

 

Solution:

After learning CPIO, I found a solution:

Step 1: check the directory required by the RPM package

 rpm2cpio XXXX.rpm | cpio -idmv

Step 2: check the corresponding directory, and you will find that it really exists, and it is not a directory…

Step 3: delete this directory and reinstall successfully!!!!

Reference: https://access.redhat.com/solutions/6189481

[Solved] Heroku Error: Web process failed to bind to $PORT within 60 seconds of launch

Error description

If the set port number is less than 1000, an error will be reported, and there is no permission

2022-03-29T10:23:16.651636+00:00 heroku[web.1]: State changed from crashed to starting
2022-03-29T10:23:28.994173+00:00 heroku[web.1]: Starting process with command `python /code/server3.py`
2022-03-29T10:23:30.091143+00:00 app[web.1]: Traceback (most recent call last):
2022-03-29T10:23:30.091204+00:00 app[web.1]: File "/code/server3.py", line 247, in <module>
2022-03-29T10:23:30.091205+00:00 app[web.1]: serverSocket.bind(("127.0.0.1", serverPort))
2022-03-29T10:23:30.091208+00:00 app[web.1]: PermissionError: [Errno 13] Permission denied
2022-03-29T10:23:30.234919+00:00 heroku[web.1]: Process exited with status 1
2022-03-29T10:23:30.279569+00:00 heroku[web.1]: State changed from starting to crashed

If you set the port number to 8080, you will not be connected within 60 seconds

2022-03-29T10:53:34.400924+00:00 heroku[web.1]: State changed from crashed to starting
2022-03-29T10:53:49.432473+00:00 heroku[web.1]: Starting process with command `python /code/server3.py`
2022-03-29T10:53:33.432308+00:00 app[api]: Release v9 created by user ***@icloud.com
2022-03-29T10:53:33.432308+00:00 app[api]: Deployed web (350d1bd5740a) by user ***@icloud.com
2022-03-29T10:54:49.500818+00:00 heroku[web.1]: Error R10 (Boot timeout) -> Web process failed to bind to $PORT within 60 seconds of launch
2022-03-29T10:54:49.587777+00:00 heroku[web.1]: Stopping process with SIGKILL
2022-03-29T10:54:49.800462+00:00 heroku[web.1]: Process exited with status 137

If the port number in the environment variable is referenced dynamically, the acquisition fails

2022-03-29T11:12:03.984493+00:00 heroku[web.1]: State changed from crashed to starting
2022-03-29T11:12:19.984320+00:00 heroku[web.1]: Starting process with command `python /code/server3.py`
2022-03-29T11:12:21.706887+00:00 heroku[web.1]: Process exited with status 1
2022-03-29T11:12:21.536737+00:00 app[web.1]: Traceback (most recent call last):
2022-03-29T11:12:21.536756+00:00 app[web.1]: File "/code/server3.py", line 248, in <module>
2022-03-29T11:12:21.536757+00:00 app[web.1]: serverSocket.bind(("0.0.0.0", serverPort))
2022-03-29T11:12:21.536757+00:00 app[web.1]: TypeError: an integer is required (got type NoneType)
2022-03-29T11:12:21.769096+00:00 heroku[web.1]: State changed from starting to crashed

No matter the local address is not written,
ServerSocket Bind ((“”, serverport))
or 127.0.0.1
ServerSocket Bind ((“127.0.0.1”, serverport))
or write 0.0.0
ServerSocket Bind ((“0.0.0.0”, serverport))
doesn’t work

 

Solution (Python Project):

Dynamic binding port number is required

To write this in dockerfile, pass in $post

#Base image based on
FROM python:3.4

#code added to code folder
ADD . /pythonProject /code

# Set the code folder to be the working directory
WORKDIR /code

# Install support
#RUN pip install -r requirements.txt

CMD python /code/server3.py $PORT

Python file writing format,  obtain through parameters in server.py

serverPort = int(sys.argv[1])

The complete structure is as follows

if __name__ == '__main__':
    serverSocket = socket(AF_INET, SOCK_STREAM)
    serverPort = int(sys.argv[1])
    serverSocket.setsockopt(SOL_SOCKET, SO_REUSEADDR, 1)
    serverSocket.bind(("", serverPort))

    serverSocket.listen(5)
    print('The server is running')
    # Main web server loop. It simply accepts TCP connections, and get the request processed in seperate threads.
    while True:
        # Set up a new connection from the client
        connectionSocket, addr = serverSocket.accept()
        # Clients timeout after 60 seconds of inactivity and must reconnect.
        connectionSocket.settimeout(600)
        # start new thread to handle incoming request
        _thread.start_new_thread(process, (connectionSocket,))

Solution (Nodejs Project):

// production
config.port = process.env.PORT

app.listen(config.port, () => {
  logger.info('Listening on port %d', config.port);
});

or

.listen(process.env.PORT || 5000)

or

production: {
    server: {
        host: '0.0.0.0',
        port: process.env.PORT
    }
}