Recently, I ran pytorch’s training code on an 8-card server without any problem. However, after the CUDA is re installed, it is impossible to specify which GPU to run on. It can only be used from Block 0 in order. After checking some information, the problem has been solved.
1. To specify which GPU to run on in Python program, the following methods are usually adopted:
import os import torch os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"
Or execute the following commands directly from the command line (not recommended):
2. According to the previous writing method, suddenly the above code is invalid. No matter how to modify the visible GPU number, the final program is used from Block 0 in order. The problem lies in the location of the specified GPU line of code“ os.environ [“CUDA_ VISIBLE_ Devices “] =” 4,5,6,7 “” move to import torch and other codes, followed by import OS, that is, in the following way:
import os os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7" import torch
3. Some common instructions for viewing GPU information are attached for later use, as follows:
import torch torch.cuda.is_available() # Check if cuda is available torch.cuda.device_count() # Returns the number of GPUs torch.cuda.get_device_name(0) # Return the GPU name, the device index starts from 0 by default torch.cuda.current_device() # Returns the current device index
- Deep learning model error + 1: CUDA error: device side assert triggered
- Libtorch Error: Expected object of type Variable but found type CUDALongType for argument #2 ‘index’
- How to Solve Error: avoided redundant navigation to current location: “index/user”
- Using elementui El-dialog as a subcomponent to close an error
- Error using tensorflow GPU: could not create cudnn handle: cudnn_STATUS_NOT_INITIALIZED
- [Solved] volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
- How to Solve ModuleNotFoundError: No module named ‘_bz2‘
- Solution: configuration of multiple front ends separated from front end and back end of nginx
- Buffer I/O Error in /var/log/messages
- On error resume next, on error goto 0, err usage
- target is multiclass but average=’binary’. please choose another average setting.
- TensorFlow issue: Expected int32, got list containing Tensors of type ‘_Message’ instead.
- Nginx: How to Use Error_Page
- runtime error ‘9’: subscript out of range error in VBA programming
- An error occurs when trying to pipe a python program to CD- sys.excepthook is missing lost sys.stderr
- (Keil MDK) UCOS floating point support abnormal solution
- Error lnk2038: detected “_ ITERATOR_ DEBUG_ Mismatched ‘level’ value of ‘0’
- Error in Tensorflow using variables: List of Tensors when single Tensor expected
- The execution of ExecuteNonQuery by oraclecommand is suspended
- How to Fix Error return arrays must be of arraytype