The errors reported by pytoch are as follows:
Pytorch distributed RuntimeError: Address already in use
reason:
The port is occupied during model multi card training. Just change the port.
Solution:
Add a parameter — master before running the command_ For example:
--master_port 29501
The following parameter 29501 can be set to any other port
be careful:
This parameter should be loaded in front of xxx.py, for example:
CUDA_VISIBLE_DEVICES=2,7 python3 -m torch.distributed.run /
--nproc_per_node 2 --master_port 29501 train.py
Read More:
- [Solved] DDP/DistributedDataParallel Error: RuntimeError: Address already in use
- RuntimeError: Address already in use [How to Solve]
- [Solved] mmdetection benchmark.py Error: RuntimeError: Distributed package doesn‘t have NCCL built in
- Solve pytorch multiprocess valueerror: error initializing torch.distributed using env: //rendezvou… Error
- [Solved] Pytorch Error: RuntimeError: Error(s) in loading state_dict for Network: size mismatch
- [Solved] PyTorch Caught RuntimeError in DataLoader worker process 0和invalid argument 0: Sizes of tensors mus
- pytorch RuntimeError: Error(s) in loading state_ Dict for dataparall… Import model error solution
- [Solved] bushi RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/s
- [Solved] PyTorch Load Model Error: Missing key(s) RuntimeError: Error(s) in loading state_dict for
- [Solved] RuntimeError : PyTorch was compiled without NumPy support
- pytorch: RuntimeError CUDA error device-side assert triggered
- [Solved] pytorch loss.backward() Error: RuntimeError: Function AddBackward0 returned an invalid gradient at index 1…
- [Solved] Pytorch error: RuntimeError: one of the variables needed for gradient computation
- Pytorch Error: runtimeerror: expected scalar type double but found float
- [Solved] Pytorch Error: RuntimeError: expected scalar type Double but found Float
- Pytorch Error: RuntimeError: value cannot be converted to type float without overflow: (0.00655336,-0.00
- Pytorch torch.cuda.FloatTensor Error: RuntimeError: one of the variables needed for gradient computation has…
- Pytorch Loading model error: RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict
- [Solved] pytorch Error: RuntimeError: Unable to find a valid cuDNN algorithm to run convolution
- [Solved] RuntimeError: NCCL error in: XXX, unhandled system error, NCCL version 2.7.8