RuntimeError: CUDA error: an illegal memory access was encountered

Question:

When I encountered this problem on the way to write the model, baidu either said it was the pytorch version problem or the category index exceeded, but it was useless, because the error was a very simple assignment operation.

scores[:, 0] = -float("inf") 
#RuntimeError: CUDA error: an illegal memory access was encountered

At the same time, in the process of debugging, it is found that a warning burst after the execution of a network of the model

lm_logits = self.linear(outputs) + self.bias
#warning:Thudacheck FAIL file=/pytorch/aten/c/THC/Thccachinghostallocator cpp Line=278 error=700: an illegal memory access was encountered

At first glance, both places are relatively simple, but they reported strange mistakes.

Solution:

The debug process found an exception

In the data data output by the pytorch network, the variable does not display the specific network output value, but the address information of the data

T:torch.Tensor object at 0x7fb27e7c8f30
data:torch.Tensor object at 0x7fb27e7c8f30

Later, it was found that it was because of self The linear layer is’ CPU ‘, while other networks are on’ CUDA ‘, which is equivalent to the inconsistency caused by the forward propagation of’ CUDA ‘type data to the’ CPU ‘network. Just transfer the network to’ CUDA ‘.

Read More: