System operation and maintenance has always been a delicate work. In addition to the constraints of rules and specifications, the preciseness and caution of operation and maintenance personnel are also essential. Sometimes a simple mistake will lead to a disaster, as small as a character or a space.
In this case, Oracle RAC suffered a failed restart due to a blank space.
Phenomenon of failure: The customer 10.2.0.4 RAC for Solaris 10 environment suddenly experienced an instance restart.
Failure process: The database runs normally until about 3 p.m., then the two nodes are restarted separately, and the instance on one of the nodes cannot be started automatically. A review of the alarm logs for both instances found that a significant ORA-27504 error occurred on both nodes before the node was restarted.
Error message:
ORA-27504: IPC error creating OSD context
ORA-27300: OS system dependent operation:
if_not_found failed WITH STATUS: 0
ORA-27301: OS failure message: Error 0
ORA-27302: failure occurred at: skgxpvaddr9
ORA-27303: additional information:
Requested Interface 192.168.168.3 NOT Found.
CHECK output FROM ifconfig command
Note that the error message is clear and the requested IP address does not exist, so you need to check the output of Ifconfig.
Next is the IPC timeout:
Wed Apr 10 15:08:13 2013
ospid 25678: network interface WITH IP
Address 192.168.168.3 No longer operational
Requested Interface 192.168.168.3 NOT Found.
CHECK output FROM ifconfig command
Wed Apr 10 15:08:16 2013
IPC Send timeout detected.Sender: ospid 25748
Receiver: inst 2 binc 430164 ospid 11890
Then instance expulsion is inevitable:
Wed Apr 10 15:16:40 2013
Waiting FOR instances TO leave:
2
The cause of the problem can be easily analyzed according to the error message. The IP address on node 2 was modified, causing abnormal heartbeat communication. Node 1 tried to kick node 2 out of the cluster, but could not communicate with node 2, so it had to wait for node 2 to restart.
Check the operating system log of Node 2 to get the following main information:
Apr 10 15:00:04 IP: [ID 482227 Kern. notice] IP_arp_done: Init failed
Had[4135]: [ID 702911 daemon.notice] VCS CRITICAL
CPU usage ON bj-sst IS 92%
sshd[13485]:error: Failed TO allocate internet-DOMAIN X11 display socket.
The IP_arp_done: init failed message appeared at 15:04 seconds, indicating that the host name information was used when setting up the network card interface, and the IP address of the host was modified online.
Finally, according to HISTORY, it was found that someone logged into the system through root:
Execute ifconfig — a6 to check the IPV6 address, but the command is typed incorrectly
Ifconfig — A 6 is executed, with an extra space between A and 6
Causes all IP addresses of the host to be set to 0.0.0.0
Thus causes the above whole fault, a blank causes the whole cluster to crash instantly, this is the blood case that a blank causes.
The lesson from this case is that any operation, at the command level, also needs to be careful for privileged users, including DBA users and ROOT users.
Review the use of the ifconfig command by the way:
The ifconfig command is used to configure and display network parameters for network interfaces in the Linux kernel. The network card information configured with the ifconfig command does not exist after the network card is restarted and the machine restarts. In order to keep the above configuration information in the computer forever, it is necessary to modify the configuration file of the network card.
grammar
The ifconfig (parameters)
parameter
add< Address & gt; : Set the IP address of IPv6 for network devices;
del< Address & gt; : Delete the IP address of IPv6;
Down: Turn off the specified network device;
< hw< Type of Network Equipment & GT; < Hardware address & GT; : Set the type and hardware address of the network device;
io_addr< I/O address & gt; : Set the I/O address of the network device;
irq< IRQ address & gt; : Set the IRQ of the network device;
media< Type of Network Media & GT; : Set the media type of the network device;
mem_start< Memory address & GT; : Set the starting address occupied by the network device in the main memory;
metric< The number & gt; : Specifies the number to be added when calculating the number of times a packet is forwarded;
mtu< Byte & gt; : Set the MTU of the network device;
netmask< Subnet mask & GT; : Set the subnet mask of the network device;
tunnel< Address & gt; : Establish the channel communication address between IPv4 and IPv6;
Up: Starts the specified network device;
-broadcast< Address & gt; : Packets to be sent to the specified address will be treated as broadcast packets;
-pointopoint< Address & gt; : Establish a direct connection with the network device at the specified address. This mode has the security function;
— Promiscuous mode for turning off or starting designated network devices;
IP address: Specify the IP address of the network device;
Network device: Specifies the name of the network device.
Explanation:
Eth0 represents the first network card, where HWaddr represents the physical address of the card. You can see that the current physical address of the card (MAC address) is 00:16:3E:00:1E:51.
Inet ADDr is used to represent the IP address of the network card. The IP address of this network card is 10.160.7.81, the broadcast address is 10.160.15.255, and the Mask address is 255.255.240.0.
Lo is the bad return address of the host. This is generally used to test a network program, but it does not want users on LAN or external network to be able to view it. Instead, it can only run and view the network interface used on this host. For example, if you specify the HTTPD server to return to a bad address, type 127.0.0.1 in your browser to see the WEB site you are hosting. But as long as you can see, no other host or user of the LAN knows.
Line 1: Connection type: Ethernet (Ethernet) HWaddr (hardware MAC address).
The second line: IP address, subnet, mask of the network card.
The third row: UP (for the nic’s open state) RUNNING (for the nic’s cable to be connected) MULTICAST MTU:1500 (for the maximum transmission unit); MULTICAST :1500 bytes.
The fourth and fifth lines: receiving and sending data packets.
Line 7: Receive and send data byte count statistics.
Start and close the specified network card:
The ifconfig eth0 up
The ifconfig eth0 down
Ifconfig eth0 up to start the network card eth0, ifconfig eth0 down to close the network card eth0. Use SSH to log into a Linux server. You can’t turn it on if it’s turned off, unless you have multiple network CARDS.
Configure and remove IPv6 addresses for network CARDS:
Ifconfig eth0 add 33 ffe: 3240:800-1005: : 2/64
Configure IPv6 addresses for the network card eth0
Ifconfig eth0 del ffe 33:3240-800:1005: : 2/64
Remove the IPv6 address for the network card eth0
Modify MAC address with IFConfig:
Ifconfig eth0 HW Ether 00:AA:BB:CC: DD :EE
Configure IP address:
[root@localhost ~]# ifconfig eth0 192.168.2.10
[root@localhost ~]# ifconfig eth0 192.168.2.10 Netmask 255.255.255.0
[root@localhost ~]# ifconfig eth0 192.168.2.10 Netmask 255.255.255.0 Broadcast 192.168.2.255
Enable and disable ARP protocol:
Ifconfig eth0 ARp # opens the ARP protocol for network card eth0
Ifconfig eth0 – ARp # Close the ARP protocol for network card eth0
Set the maximum transmission unit:
Ifconfig eth0 mtu 1500 # sets the maximum packet size that can pass to 1500 bytes
Comprehensive source: public “data and cloud”, etc
Develop an enterprise-class monitoring platform in Python
Use Python code to automatically grab train tickets
Ctrip operation and maintenance automation platform, tens of thousands of server changes can also be very easy
Is intelligent operation and maintenance personnel replaced by AI?
Look at Tencent operation and maintenance to deal with the “18 years old photos of the national nostalgia” event plan, you will not regret!
Seamless operation: a best practice of alibaba’s operation and maintenance guarantee system
Forever young! The 20-year struggle history of an old operation and maintenance
Hungry?Remote dual live database combat
Operation and maintenance version of “Chengdu”, listen to how many people cry…
second level monitoring under the order of ali trillion transactions
Salvation of IT Operation and Maintenance — The ideal practice of SF Operation and Maintenance
Want to get a closer look at Tencent SNG team’s operation and maintenance system?
Come to the 9th GOPS Global Operations Conference.
Shenzhen, April 13-14, 2018.
The two-day conference features 19 special sessions covering a wide range of technical areas including AIOps, Operations automation and DevOps.
Click to read the original text and enter the official website of the conference
Read More:
- Caused by: org.apache.maven.plugin.MojoFailureException: There are test failures.
- When a system is deployed on weblogic12.2.1.3, it reports an error “IllegalStateException zip file closed”. When it is deployed on weblogic12.2.1.2, it does not report an error and can be accessed normally.
- Android studio introduces code error, but it can run normally
- UnhandledPromiseRejectionWarning Error: Can‘t set headers after they are sent (How to Fix)
- quill Cannot import ImageResize. Are you sure it was registered?
- The route addition failed: Either the interface index is wrong or the gateway do es not lie on the s
- When SAP receives the goods, the system prompts that it can only be recorded in the period 2009 / 09 and 2009 / 08 of company code 1101
- Schema validation error in conig/config.xml. See the log for detais, Schema validation can be……
- Curl returns empty reply from server. Due to the processing of special characters, curl cannot be accessed and the browser can access it.
- Centos8 solves SSH secure shell error algorithmic negotiation failures
- An error is reported when installing the package directly in pycharm, but it can be installed through the terminal. Error non zero exit code (2)
- Error about XX error 1 querying major version
- Solutions to the problem of using sudo caused by Ubuntu’s wrong modification of sudoers
- configure: error: C compiler cannot create executables See `config.log’ for more details
- Job for network.service Failed because the control process exited with error code. See “SystemC
- What are the web front end technologies? What are the differences between cookie and session
- “ XX.app ”It is damaged and cannot be opened. You should move it to the wastebasket.
- Solution to error 1452: cannot add or update a child row: a foreign key constraint failures in MySQL
- You set the variable “no_check_targets“ here and it was unused before it went out of scope.
- The problem of mobile hybrid development RN Android deployment appears Unsupported class file major version xx