Communication link failure when connecting Doris

Springboot queries Doris with an error

ERROR [http-nio-10020-exec-12] [http-nio-10020-exec-12raceId] [] [5] @@GlobalExceptionAdvice@@ | server error 
org.springframework.dao.RecoverableDataAccessException: 
### Error querying database.  Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 426 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.
; Communications link failure

The last packet successfully received from the server was 426 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.; nested exception is com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

The last packet successfully received from the server was 426 milliseconds ago.  The last packet sent successfully to the server was 0 milliseconds ago.

An error is reported in the insert into select task scheduled by Doris

ERROR 2013 (HY000) at line 7: Lost connection to MySQL server during query

analysis

It may be that slow queries cause huge pressure on the cluster.
several slow queries reach 120s-400s, which is unbearable for the Doris cluster because of the global query_ The timeout parameter is 60. It is assumed that the task session variable of someone is set to 600s or higher

Let the development offline slow query task and the tuning SQL
slow query task for more than 100 seconds work normally after offline

But after a while, the springboot service alarms. There are mistakes again

Doris parameter

interactive_timeout=3880000

wait_timeout=3880000

Doris Fe service node alarm log

2021-06-03 16:00:08,398 WARN (Connect-Scheduler-Check-Timer-0|79) [ConnectContext.checkTimeout():365] kill wait timeout connection, remote: 1.1.1.1:57399, wait timeout: 3880000
2021-06-03 16:00:08,398 WARN (Connect-Scheduler-Check-Timer-0|79) [ConnectContext.kill():339] kill timeout query, 1.1.1.1.1:57399, kill connection: true

Doris monitoring

It can be seen that the number of connections at 15:44 drops sharply

#Elk log
you can also see that the alarm and error messages of Doris queried by springboot service also start at 15:44
so what operation variables affect the cluster at 15:44?

See waite according to the error report
_ The time is 3880000s, which is 44 days, but the default in the source code is 28800s

interactive_timeout=3880000

wait_timeout=3880000

No one went online, no one cut, and the Cluster Administrator was in my hands. I didn’t change the parameters, but I’m still not sure why the parameters will change. Go to the fe.audit audit audit log to check the operation records. Sure enough,
someone ( insider ) was using the 2020.2.3 version of DataGrid. At 15:44, the set global parameters were modified

interactive_timeout=3880000

wait_timeout=3880000

call back the two parameters to 28800s , and the connections of the cluster are restored immediately
it should be noted here that in the discussion with the community, there is only wait in Doris_ Timeout works, and the other is interactive_ Timeout in order to be compatible with MySQL, it doesn’t work

Question: why wait in Doris_ When the timeout parameter is too large, it will cause a connection error communications link failure on the contrary, it can return to normal after being reduced. You need to sort out the code and look at the logic


Please check the 
 connection Doris error communications link failure


Read More:

Solution to communication link failure with error in idea startup project
Mysql database error (communications link failure)
SQL Error: 0, SQLState: 08S01 & Communications link failure
Sqoop import error communications link failure
Cause: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure (How to Fix)
Resolve the exception MySQL lontransientconnectionexception: communications link failure during rollback()
Link: fatal error lnk1123: failure during conversion to coff: file in
When executing hive – f script com.mysql.jdbc . exceptions.jdbc4 .CommunicationsException: Communications link failure
LinkIssue: Error ‘LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or cor
VS2010 error: LINK : fatal error LNK1123: failure during conversion to COFF: file invalid or corrupt
Got an error reading communication packets
Doris decommission be node stuck [How to Solve]
Aidl communication and problems encountered
Doris query task failed to initialize storage reader
Invalid cluster ID. ignore in building Doris database environment
To be solved: one SSD failure, format failure
Mac compiles Doris with MVN and reports an error checkstyle
Common problems of Aidl cross process communication
Centos6 suddenly cannot access the network VM communication interface socket family: failed
[ABAP] sproxy opens ESR and reports an error has occurred during communication ESR

ProgrammerAH

Programmer Guide, Tips and Tutorial

Communication link failure when connecting Doris

Read More: