[Solved] RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found

GPU Error: RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

pg = ProcessGroupNCCL(prefix_store, rank, world_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

At first, this mistake made me wonder if this GPU was useless, – – |, But the little partners in the lab are sure that GPU is OK! Then I started the bug troubleshooting journey

At this time, when viewing the command line, it finally shows its feet. It is estimated that there is a problem with pytorch, which is harmful!

>>> import torch
>>> print(torch.cuda.is_available())
/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:112.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>> print(torch.cuda.get_device_name(0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 326, in get_device_name
    return get_device_properties(device).name
  File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 356, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

After checking this error, it shows that the versions of CUDA and torch do not match.

Check the version of pytorch, 1.10 +. OK, try installing a lower version of torch!

pip install torch==1.7.0

be accomplished!

Read More: