Recently, I ran pytorch’s training code on an 8-card server without any problem. However, after the CUDA is re installed, it is impossible to specify which GPU to run on. It can only be used from Block 0 in order. After checking some information, the problem has been solved.
1. To specify which GPU to run on in Python program, the following methods are usually adopted:
import os
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"
Or execute the following commands directly from the command line (not recommended):
export CUDA_VISIBLE_DEVICES=4,5,6,7
2. According to the previous writing method, suddenly the above code is invalid. No matter how to modify the visible GPU number, the final program is used from Block 0 in order. The problem lies in the location of the specified GPU line of code“ os.environ [“CUDA_ VISIBLE_ Devices “] =” 4,5,6,7 “” move to import torch and other codes, followed by import OS, that is, in the following way:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"
import torch
3. Some common instructions for viewing GPU information are attached for later use, as follows:
import torch
torch.cuda.is_available() # Check if cuda is available
torch.cuda.device_count() # Returns the number of GPUs
torch.cuda.get_device_name(0) # Return the GPU name, the device index starts from 0 by default
torch.cuda.current_device() # Returns the current device index
Read More:
- [Solved] Pytorch loading model specified GPU card number error or failed to specify
- [Solved] Pytorch3d Error: RuntimeError: Not compiled with GPU support.
- Opencv Can not Find opencv2/gpu/gpu.hpp [How to Solve]
- Internalerror: GPU sync failed error (How to Solve)
- How to Solve error: command ‘C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin\nvcc.exe‘ failed
- [Solved] MindSpore Error: Select GPU kernel op * fail! Incompatible data type
- [Solved] java.lang.NoClassDefFoundError: Failed resolution of: Lorg/apache/http/util/ByteArrayBuffer
- Error 1 error MSB3721: Command ““C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin\nvcc.exe” -genco
- [Solved] Pytorch Error: RuntimeError: Trying to backward through the graph a second time
- Tensorflow GPU error (4 Type Error and their Solutions)
- Error using tensorflow GPU: could not create cudnn handle: cudnn_STATUS_NOT_INITIALIZED
- Pytorch error: `module ‘torch‘ has no attribute ‘__version___‘`
- [Solved] NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL ,unhandled cuda error, NCCLversion 2.7.8
- [Solved] pytorch CrossEntropyLoss Error: RuntimeError: 1D target tensor expected, multi-target not supported
- Hive 3.1.2 startup error reporting and resolution of guava version conflict
- [Solved] pytorch Load Error: “RuntimeError: Error(s) in loading state_dict for Sequential:”
- Error reporting and resolution of kubernetes installation
- [Solved] jetson Compile pytorch Error: internal compiler error: Segmentation fault
- When starting Vue project: cannot find module ‘webpack cli / bin / config yargs’ error resolution
- [Solved] TF2.4 Error: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize