[Pytorch Error Solution] Pytorch distributed RuntimeError: Address already in use

The errors reported by pytoch are as follows:

Pytorch distributed RuntimeError: Address already in use

reason:

The port is occupied during model multi card training. Just change the port.

Solution:

Add a parameter — master before running the command_ For example:

 --master_port 29501

The following parameter 29501 can be set to any other port

be careful:

This parameter should be loaded in front of xxx.py, for example:

CUDA_VISIBLE_DEVICES=2,7 python3 -m torch.distributed.run /
--nproc_per_node 2  --master_port 29501  train.py

[Solved] DDP/DistributedDataParallel Error: RuntimeError: Address already in use
RuntimeError: Address already in use [How to Solve]
[Solved] mmdetection benchmark.py Error: RuntimeError: Distributed package doesn‘t have NCCL built in
Solve pytorch multiprocess valueerror: error initializing torch.distributed using env: //rendezvou… Error
[Solved] Pytorch Error: RuntimeError: Error(s) in loading state_dict for Network: size mismatch
[Solved] PyTorch Caught RuntimeError in DataLoader worker process 0和invalid argument 0: Sizes of tensors mus
pytorch RuntimeError: Error(s) in loading state_ Dict for dataparall… Import model error solution
[Solved] bushi RuntimeError: version_ ＜= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/s
[Solved] PyTorch Load Model Error: Missing key(s) RuntimeError: Error(s) in loading state_dict for
[Solved] RuntimeError : PyTorch was compiled without NumPy support
pytorch: RuntimeError CUDA error device-side assert triggered
[Solved] pytorch loss.backward() Error: RuntimeError: Function AddBackward0 returned an invalid gradient at index 1…
[Solved] Pytorch error: RuntimeError: one of the variables needed for gradient computation
Pytorch Error: runtimeerror: expected scalar type double but found float
[Solved] Pytorch Error: RuntimeError: expected scalar type Double but found Float
Pytorch Error: RuntimeError: value cannot be converted to type float without overflow: (0.00655336,-0.00
Pytorch torch.cuda.FloatTensor Error: RuntimeError: one of the variables needed for gradient computation has…
Pytorch Loading model error: RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict
[Solved] pytorch Error: RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
[Solved] RuntimeError: NCCL error in: XXX, unhandled system error, NCCL version 2.7.8

ProgrammerAH

Programmer Guide, Tips and Tutorial

[Pytorch Error Solution] Pytorch distributed RuntimeError: Address already in use

Read More: