Recv () Failed (104: Connection reset by Peer) problem troubleshooting
In the recent project, through the Nginx reverse proxy nodejs (using nestjs framework) service, the probability of 502 Bad Gateway
appears in the pressure testing process, with a low probability of around 0.005%. The specific error message in the log is recv() failed (104: Connection reset by peer) while reading the response header from upstream
, through searching the data, we learned that the direct reason for the error was:
nodejs服务已经断开了连接,但是未通知到Nginx,Nginx还在该连接上收发数据,最终导致了该报错。
Considering that in Nginx we have configured to open a long connection with nodeJS Server, namely:
proxy_http_version 1.1;
proxy_set_header Connection "";
Let’s start with the Settings related to long connections. Several common parameters that Nginx USES to affect long connections are: keepalive_timeout
, keepalive_requests
, and keepalive
1) keepalive_timeout
: set the client’s long connection timeout. If the client does not make a request beyond this time, Nginx server will actively close the long connection. Nginx defaults to keepalive_timeout 75s; code>. Some browsers only hold 60 seconds at most, so we usually set it to
60s
. If set to 0
, close the long connection. 2) keepalive_requests
: set the maximum number of requests a long connection can handle with the client. If this value is exceeded, Nginx will actively close the long connection. The default value is 100
. Under normal circumstances, keepalive_requests 100
can basically meet the requirements. However, in the case of high QPS, the number of continuous long connection requests reaches the maximum number of requests and is shut down. This means that Nginx needs to keep creating new long connection to handle the requests. Set the maximum number of connection to the upstream server free keepalive, the number of connections when idle keepalive exceeds this value, the least recently used connection will be closed, if this value is set too small, a certain time period the number of requests, and request processing time is not stable, may be kept closed and create a long number of connections. We usually set 1024
. In special scenarios, the average response time and QPS of the interface can be estimated. 4) Open a long connection with the upstream server The default nginx access backend is a short (HTTP1.0) connection that is opened each time a request is made, closed after processing, and reopened on the next request. The HTTP protocol has supported long connections since version 1.1. Therefore, we will set the following parameters in location to open a long connection:
http {
server {
location/ {
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
}
1) and 2) are set to be long connections between client and Nginx, and 3) and 4) are set to be long connections between Nginx and server.
After checking the above parameter configuration, we started to look for the configuration related to keepalive in node. By looking up nodejs documents, we found that the default server.keepAliveTimeout
is 5000ms
. If you continue to send and receive data on the long connection, the above error may occur.
So we add more to the node keepAliveTimeout code> value (see https://shuheikagawa.com/blog/2019/04/25/keep-alive-timeout/) :
// Set up the app...
const server = app.listen(8080);
server.keepAliveTimeout = 61 * 1000;
server.headersTimeout = 65 * 1000; // Node.js >= 10.15.2 需要设置该值大于keepAliveTimeout
The purpose of lengthening the node keepAliveTimeout
value is to prevent Nginx from disconnecting before node service, and the caller needs to timeout longer than the caller.
After the above modification, no error 502 Bad Gateway
occurred in Nginx during the pressure test.
Read More:
- Process of checking the error of connection reset by peer reported by reactor netty
- subsystem request failed on channel 0 Couldn’t read packet: Connection reset by peer
- ConnectionResetError: [Errno 104] Connection reset by peer
- About connection reset by peer: socket write error
- Solve the problem caused by: java.sql.SQLRecoverableException : IO error: connection reset related problems
- Caused by: java.net.SocketException : connection reset or caused by: java.sql.SQLRecoverableException solve
- Nginx startup error: nginx: [error] open() “/ var / run / nginx/ nginx.pid The solution of “failed (2: no such file or directory)”
- <info> peer_manager_handler: Connection security failed
- Nginx reports 502 error, log connect() failed (111: Connection refused) while connecting to upstream. A personal effective solution
- Failed to load resource: net::ERR_CONNECTION_RESET
- nginx: [error] open() “/usr/local/nginx/logs/nginx.pid“ failed (2: No such file or directory)
- nginx: [error] open() "/var/run/nginx/nginx.pid" failed (2: No such file or directory)
- /var/run/nginx/nginx.pid” failed (2: No such file or directory)
- nginx: [error] open() “/run/nginx.pid” failed (2: No such file or directory)
- nginx: open() “/var/run/nginx.pid“ failed (2: No such file or directory)
- nginx: [error] open() “/usr/local/var/run/nginx.pid” failed (2: No such file or directory)
- java.sql.SQLRecoverableException: IO Error: Connection reset
- Nginx start error: job for nginx.service failed because the control process exited with error code
- The browser console reports failed to load resource: net :: ERR_CONNECTION_RESET, which results in unable to log in
- CentOS7 nginx Failed to read PID from file /run/nginx.pid: Invalid argument?