1. Check whether the appropriate version of torch is used
print(torch.__version__) # 1.9.1+cu111
print(torch.version.cuda) # 11.1
print(torch.backends.cudnn.version()) # 8005
print(torch.cuda.current_device()) # 0
print(torch.cuda.is_available()) # TRUE
2. Check whether the video memory is insufficient, try to modify the batch size of the training, and it still cannot be solved when it is modified to the minimum, and then use the following command to monitor the video memory occupation in real time
watch -n 0.5 nvidia-smi
When the program is not called, the display memory is occupied
Therefore, the problem is that the program specifies to use four GPUs. There is no problem when calling the first two resources, but the third block is occupied by the programs of other small partners, so an error is reported.
3. Specify the GPU to use
device = torch.device("cuda" if torch.cuda.is_available() and not args.no_cuda else "cpu") # cuda Specifies the GPU device to be used
model = torch.nn.DataParallel(model, device_ids=[0, 1, 3]) # Specify the device number to be used for multi-GPU parallel processing
So you can run happily
Read More:
- PyCharm Error: RuntimeError: CUDA out of memory [How to Solve]
- Node Memory Overflow: FATAL ERROR: Reached heap limit Allocation failed – JavaScript heap out of memory
- [Solved] RuntimeError: CUDA error: invalid device ordinal
- Cuda Runtime error (38) : no CUDA-capable device is detected
- [Solved] Runtimeerror: CUDA error: device side assert triggered
- [Solved] RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)`
- How to Solve Git error: out of memory
- TensorRT model quantization error: Error Code 1: Cuda Runtime (an illegal memory access was encountered)
- [Solved] torch Do Targer Detection Error: RuntimeError: CUDA error: device-side assert triggered
- Solve Error: call_and_retry_last allocation failed – javascript heap out of memory
- [Solved] CUDA error:-UserWarning: CUDA initialization: CUDA unknown error
- [Solved] NPM run build package error: Ineffective mark-compacts near heap limit Allocation failed – JavaScript heap out of memory
- [Solved] Build Error Ineffective mark-compacts near heap limit Allocation failed – JavaScript heap out of memory
- [Solved] UserWarning: CUDA initialization: CUDA unknown error
- TensorFlow-gpu Error: failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
- CUDA_ERROR_SYSTEM_DRIVER_MISMATCH [How to Solve]
- CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
- Deep learning model error + 1: CUDA error: device side assert triggered
- [Solved] NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL ,unhandled cuda error, NCCLversion 2.7.8
- Error in *** : subscript out of bounds [How to Solve]