-- Process 6 terminated with the following error:
Traceback (most recent call last):
File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/detectron2/engine/launch.py", line 108, in _distributed_worker
raise e
File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/detectron2/engine/launch.py", line 103, in _distributed_worker
timeout=timeout,
File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8
The code requires 8 GPUs to run, while the machine has only two cards
so set the code to run with two cards.
Read More:
- RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the
- ImportError: cannot import name ‘Optional‘ from ‘torch.jit.annotations‘ (F:\Python37\lib\site-packag
- Perfect solution to raise runtimeerror (“distributed package doesn’t have nccl”) in Windows system“
- RuntimeError: reciprocal is not implemented for type torch.cuda.LongTensor
- RuntimeError: Couldn‘t open shared file mapping: <torch_16716_3565374679>, error code: <1455>
- configure: error: C++ preprocessor “/lib/cpp” fails sanity check
- torch.cuda.is_ Available() returns false
- How to use torch.sum()
- CONDA install torch error
- torch.nn.BCELoss are unsafe to autocast [How to Solve]
- AttributeError: module ‘torch’ has no attribute’_six’ [The problem is solved after restart]
- AttributeError: Can‘t get attribute ‘LeNet‘ on <module ‘__ main__ “From (error in torch loading model)
- To solve the problem of increasing video memory when training network (torch)
- ImportError: cannot import name ‘SummaryWriter‘ from partially initialized module ‘torch.utils.tenso
- Error in installing torch vision or pilot on Linux or Jetson nano: the headers or library files could not be found for JPEG
- ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256,
- pytorch raise RuntimeError(‘Error(s) in loading state_dict for {}:\n\t{}‘.format
- Build your own resnet18 network and load torch vision’s own weight
- Copying a param with shape torch. Size ([262, 2048]), parameter size does not match
- RuntimeError: ‘lengths’ argument should be a 1D CPU int64 tensor, but got 1D cuda:0 Long tensor