RuntimeError: Address already in use

The errors reported by pytoch are as follows:

Pytorch distributed RuntimeError: Address already in use

reason:

The port is occupied during model multi card training. Just change the port.

Solution:

Add a parameter — master before running the command_ For example:

 --master_port 29501

The following parameter 29501 can be set to any other port

be careful:

This parameter should be loaded in front of xxx.py, for example:

CUDA_VISIBLE_DEVICES=2,7 python3 -m torch.distributed.run /
--nproc_per_node 2  --master_port 29501  train.py

ProgrammerAH