Recently, I ran pytorch’s training code on an 8-card server without any problem. However, after the CUDA is re installed, it is impossible to specify which GPU to run on. It can only be used from Block 0 in order. After checking some information, the problem has been solved.
1. To specify which GPU to run on in Python program, the following methods are usually adopted:
import os
import torch
os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"
Or execute the following commands directly from the command line (not recommended):
export CUDA_VISIBLE_DEVICES=4,5,6,7
2. According to the previous writing method, suddenly the above code is invalid. No matter how to modify the visible GPU number, the final program is used from Block 0 in order. The problem lies in the location of the specified GPU line of code“ os.environ [“CUDA_ VISIBLE_ Devices “] =” 4,5,6,7 “” move to import torch and other codes, followed by import OS, that is, in the following way:
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"
import torch
3. Some common instructions for viewing GPU information are attached for later use, as follows:
import torch
torch.cuda.is_available() # Check if cuda is available
torch.cuda.device_count() # Returns the number of GPUs
torch.cuda.get_device_name(0) # Return the GPU name, the device index starts from 0 by default
torch.cuda.current_device() # Returns the current device index