Socket Error 104 bug

Description of the bug
Technology stack
nginxuwsgibottle
Error details
Alarm robots often have the following warnings:

<27>1 2018-xx-xxT06:59:03.038Z 660ece0ebaad admin/admin 14 - - Socket Error: 104
<31>1 2018-xx-xxT06:59:03.038Z 660ece0ebaad admin/admin 14 - - Removing timeout for next heartbeat interval
<28>1 2018-xx-xxT06:59:03.039Z 660ece0ebaad admin/admin 14 - - Socket closed when connection was open
<31>1 2018-xx-xxT06:59:03.039Z 660ece0ebaad admin/admin 14 - - Added: {'callback': <bound method SelectConnection._on_connection_start of <pika.adapters.select_connection.SelectConnection object at 0x7f74752525d0>>, 'only': None, 'one_shot': True, 'arguments': None, 'calls': 1}
<28>1 2018-xx-xxT06:59:03.039Z 660ece0ebaad admin/admin 14 - - Disconnected from RabbitMQ at xx_host:5672 (0): Not specified
<31>1 2018-xx-xxT06:59:03.039Z 660ece0ebaad admin/admin 14 - - Processing 0:_on_connection_closed
<31>1 2018-xx-xxT06:59:03.040Z 660ece0ebaad admin/admin 14 - - Calling <bound method _CallbackResult.set_value_once of <pika.adapters.blocking_connection._CallbackResult object at 0x7f74752513f8>> for "0:_on_connection_closed"

The debug process
Determine the error location
If you have a log, it’s easy to do. First of all, look at where the log is typed.
Our own code
No.
Uwsgi code
root@660ece0ebaad:/# uwsgi –version
2.0.14
from github co down, no.
Python Library code
Execute in the container

>>> import sys
>>> sys.path
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0']

Under these directories, grep is found in pika

root@660ece0ebaad:/usr/local/lib/python2.7# grep "Socket Error" -R .
Binary file ./dist-packages/pika/adapters/base_connection.pyc matches
./dist-packages/pika/adapters/base_connection.py:            LOGGER.error("Fatal Socket Error: %r", error_value)
./dist-packages/pika/adapters/base_connection.py:            LOGGER.error("Socket Error: %s", error_code)

Determine the PIKA version.

>>> import pika
>>> pika.__version__
'0.10.0'

Error determination logic
It can be seen from the code that the Socket Error is the Error code of errno, and it is determined that the meaning of the Error code is that RST is sent to the opposite end.

>>> import errno
>>> errno.errorcode[104]
'ECONNRESET'

Suspected rabbitmq server address error, an unlistened port will return RST, after verification found that it is not.
then suspected link timeout break without notifying the client, etc. Take a look at the RabbitMQ Server logs and find a large number of:

=ERROR REPORT==== 7-Dec-2018::20:43:18 ===
closing AMQP connection <0.9753.18> (172.17.0.19:27542 -> 192.168.44.112:5672):
missed heartbeats from client, timeout: 60s
--
=ERROR REPORT==== 7-Dec-2018::20:43:18 ===
closing AMQP connection <0.9768.18> (172.17.0.19:27544 -> 192.168.44.112:5672):
missed heartbeats from client, timeout: 60s

It is found that all the links between RabbitMQ Server and Admin Docker have been broken

root@xxxxxxx:/home/dingxinglong# netstat -nap | grep 5672  | grep "172.17.0.19"

So why does RabbitMQ Server kick out piKA’s links?Look at the PIKA code comment:

    :param int heartbeat_interval: How often to send heartbeats.
                              Min between this value and server's proposal
                              will be used. Use 0 to deactivate heartbeats
                              and None to accept server's proposal.

We did not pass in the heartbeat interval, so in theory we should use the server default 60S. In fact, the client has never sent heartbeat packets.
verifies that a HeartbeatChecker object has been successfully created and a timer has been successfully created by printing, but the timer has never been called back.
follows through the code using blocking_connections, as seen in its add_timeout comment:

def add_timeout(self, deadline, callback_method):
    """Create a single-shot timer to fire after deadline seconds. Do not
    confuse with Tornado's timeout where you pass in the time you want to
    have your callback called. Only pass in the seconds until it's to be
    called.

    NOTE: the timer callbacks are dispatched only in the scope of
    specially-designated methods: see
    `BlockingConnection.process_data_events` and
    `BlockingChannel.start_consuming`.

    :param float deadline: The number of seconds to wait to call callback
    :param callable callback_method: The callback method with the signature
        callback_method()

The timer is triggered by process_data_Events and we are not calling it. So the client heartbeat is never triggered. Simply turn off heartbeat to solve this problem.
Specific trigger point

follows the code of basic_publish interface.
receives RST when sending, and finally prints socket_error in base_connection.py:452, _handle_error function.

def connect_mq():
    mq_conf = xxxxx
    connection = pika.BlockingConnection(
        pika.ConnectionParameters(mq_conf['host'],
                                  int(mq_conf['port']),
                                  mq_conf['path'],
                                  pika.PlainCredentials(mq_conf['user'],
                                                        mq_conf['pwd']),
                                  heartbeat_interval=0))
    channel = connection.channel()
    channel.exchange_declare(exchange=xxxxx, type='direct', durable=True)
    return channel

channel = connect_mq()

def notify_xxxxx():
    global channel

    def _publish(product):
        channel.basic_publish(exchange=xxxxx,
                              routing_key='xxxxx',
                              body=json.dumps({'msg': 'xxxxx'}))

Read More: