Debugging the Python code encountered this error
there is a similar error CUDA error: cublas_ STATUS_ INTERNAL_ ERROR when calling cublasSgemm(...)
Network search, all kinds of answers, driver version, fixed CUDA device number and so on. Although all of them have been successful, they feel unreliable.
This error message looks like a memory access error
solutions:
Check the code carefully and unify the data on CPU or GPU.
Inspection process is very troublesome, in order to facilitate inspection, I wrote a small function.
def printTensor(t, tag:str):
sz = t.size()
p = t
for i in range(len(sz)-1):
p = p[0]
if len(p)>3:
p = p[:3]
print('\t%s.size'%tag, t.size(), ' dev :', t.device, ": ",p.data)
return
When using, printtensor (context, 'context')
, the output is similar
context.size torch.Size([4, 10, 10]) dev : cuda:0 : tensor([0, 0, 0], device=‘ cuda:0 ’)
This function has two main points
- output device output data
The second point is particularly important. Only output devices do not necessarily trigger errors. Only when you output data and pytorch runs down according to the process, can you make a real error.
Finally, the author found that the network of NN. *
did not call to (device)
explicitly. However, the customized models do inherit NN. Module
, which needs to be checked in the future.
Read More:
- Python: CUDA error: an illegal memory access was accounted for
- RuntimeError:cuda runtime error (11) : invalid argument at /pytorch/aten/src/THC/generic
- CUDA error:out of memory
- RuntimeError: CUDA out of memory. Tried to allocate 600.00 MiB (GPU 0; 23.69 GiB total capacity)
- MobaXterm error cuda:out of memory
- FCOS No CUDA runtime is found, using CUDA_HOME=’/usr/local/cuda-10.0′
- Runtimeerror using Python training model: CUDA out of memory error resolution
- RuntimeError: CUDA error: out of memory solution (valid for pro-test)
- PyTorch Error: RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling cublasSgemm()
- DB2 create table error – 104 42601 illegal symbol encountered in SQL statement
- (Solved) pytorch error: RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED (install cuda)
- The firmware of the connected j-link does not support the following memory access)
- An error occurred when installing pytorch version 1.7 GPU
- LoadRunner error — memory violation: exception access_ Solution
- There is an unhandled exception at: 0xc0000005: an access conflict occurred while reading location 0x00000000.
- Arduino ide 1.6.9 problems encountered error:’TKD2′ was not declared in this scope
- An error was reported when idea compiles Java: no symbol was found_ How to solve this problem
- Tensorflow 2.1.0 error resolution: failed call to cuinit: CUDA_ ERROR_ NO_ DEVICE: no CUDA-capable device is detected
- A PHP Error was encountered Severity: Warning Message: mysqli::real_connect(): Headers and client
- RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /opt/conda/conda-bld/