The error information is as follows:
2020-12-09 14:07:56,509 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [xxx:8485, xxx:8485, xxx:8485], stream=QuorumOutputStream starting at txid 74798133))
2020-12-09 14:07:56,499 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 74798133
at java.lang.Thread.run(Thread.java:748)
at org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.run(FSEditLogAsync.java:243)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:711)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:521)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:55)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:385)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:525)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:109)
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:142)
at org.apache.hadoop.hdfs.qjournal.client.QuorumCall.rethrowException(QuorumCall.java:286)
at org.apache.hadoop.hdfs.qjournal.client.QuorumException.create(QuorumException.java:81)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Got too many exceptions to achieve quorum size 2/3. 3 exceptions thrown:
2020-12-09 14:07:56,496 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [xxx:8485, xxx:8485, xxx:8485], stream=QuorumOutputStream starting at txid 74798133))
2020-12-09 14:07:56,494 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 7611ms to send a batch of 2 edits (179 bytes) to remote journal xxx:8485
at java.lang.Thread.run(Thread.java:748)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:389)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:396)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:187)
at com.sun.proxy.$Proxy19.journal(Unknown Source)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.Client.call(Client.java:1355)
at org.apache.hadoop.ipc.Client.call(Client.java:1445)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 33 is less than the last promised epoch 34
2020-12-09 14:07:56,492 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal xxx:8485 failed to write txns 74798134-74798135. Will try to write to this JN again after the next log roll.
]
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
, xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
2020-12-09 14:07:55,886 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for sendEdits. Exceptions so far: [xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
]
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
, xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
2020-12-09 14:07:54,883 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for sendEdits. Exceptions so far: [xxx:8485: IPC's epoch 33 is less than the last promised epoch 34
at java.lang.Thread.run(Thread.java:748)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:389)
at org.apache.hadoop.hdfs.qjournal.client.IPCLoggerChannel$7.call(IPCLoggerChannel.java:396)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolTranslatorPB.journal(QJournalProtocolTranslatorPB.java:187)
at com.sun.proxy.$Proxy19.journal(Unknown Source)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.Client.call(Client.java:1355)
at org.apache.hadoop.ipc.Client.call(Client.java:1445)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1499)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at javax.security.auth.Subject.doAs(Subject.java:422)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.hdfs.qjournal.protocol.QJournalProtocolProtos$QJournalProtocolService$2.callBlockingMethod(QJournalProtocolProtos.java:27401)
at org.apache.hadoop.hdfs.qjournal.protocolPB.QJournalProtocolServerSideTranslatorPB.journal(QJournalProtocolServerSideTranslatorPB.java:162)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:179)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:372)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:484)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:458)
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 33 is less than the last promised epoch 34
2020-12-09 14:07:49,776 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Remote journal xxx:8485 failed to write txns 74798134-74798135. Will try to write to this JN again after the next log roll.
When HA is configured, one of the namenode stops, and the key message “IPC’s epoch is less than the last committed epoch” is probably due to network failure. After reading the log, every time another namenode is started, port 8485 of the three journalnode services will be detected, indicating that it is failed,
indicating that it is most likely a network problem, The troubleshooting is as follows:
ifconfig – a check whether the network card has packet loss
check whether/etc/sysconfig/SELinux = disabled is correct
/etc/init.d/iptables status check whether the firewall is running, because Hadoop is running in the Intranet environment, remember that the firewall was closed when it was deployed before
check the firewalls of three journalnode servers successively, It’s all closed
Online solutions:
1) adjust the write timeout of journal node
for example, dfs.qjournal.write-txns.timeout.ms = 90000
In fact, in the actual production environment, this kind of timeout is also easy to happen, so we need to change the default 20s timeout to a larger value, such as 60 or 90s.
We can add a set of configurations in hdfs-site.xml under Hadoop/etc/Hadoop
dfs.qjournal.write-txns.timeout.ms
60000
CDH cluster searches dfs.qjournal.write-txns.timeout.ms in HDFS configuration interface
2) adjusts the Java parameters of namenode and triggers full GC in advance, so that the time of full GC will be less
3) the default full GC mode of namenode is parallel GC, which is in STW mode and is changed to CMS format. Adjust the startup parameters of namenode:
– XX: + usecompansedoops
– XX: + useparnewgc – XX: + useconcmarksweepgc – XX: + cmsclassunloadingenabled
– XX: + usecmpackage at full collection – XX: cmsfullgcsbeforecompaction = 0
– XX: + cmsparallelremarkenabled – XX: + disableexplicitgc
– XX: + usecmsinitiatingoccupancyonly – XX: cmsinitiatingoccupancyfraction = 75
– XX: cmsfullgcsbeforecompaction SoftRefLRUPolicyMSPerMB=0
Read More:
- Error: attempting to operate on HDFS namenode as root
- Redirecting to /bin/systemctl stop firewalled.service Failed to stop firewalled.service: Unit firewa
- Namenode startup error: outofmemoryerror: Java heap space
- Error in initializing namenode when configuring Hadoop!!!
- Why namenode can’t be started and its solution
- Start Additional NameNode [How to Solve]
- [Solved] Spark SQL Error: File xxx could only be written to 0 of the 1 minReplication nodes.
- Hash verification failed for CDH5.8.2 installation
- HBase hangs up immediately after startup. The port reports an error of 500 and hmaster aborted
- Kafka connection abnormal org.apache.kafka . common.errors.TimeoutException : Failed to update metadata after 60000 ms.
- Abnormal report error javax.net.ssl .SSLHandshakeException: server certificate change is restrictedduring renegotiation
- Convergence warning: lbfgs failed to converge (status=1): Stop: total no. of iterations reached L
- CDH opens Kerberos and reports an error: ticket expired
- Stop: job failed while stopping
- CDH HDFS webui browser authentication (after Kerberos authentication is enabled)
- hive is not allowed to impersonate anonymous
- Oracle Net Configuration Assistant failed abnormal solution
- [Solved] Golden Gate ggsci start manager: ERROR: Parameter file mgr.prm does not exist.
- Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127
- Abnormal display of page object moved to here