Question:
When I encountered this problem on the way to write the model, baidu either said it was the pytorch version problem or the category index exceeded, but it was useless, because the error was a very simple assignment operation.
scores[:, 0] = -float("inf")
#RuntimeError: CUDA error: an illegal memory access was encountered
At the same time, in the process of debugging, it is found that a warning burst after the execution of a network of the model
lm_logits = self.linear(outputs) + self.bias
#warning:Thudacheck FAIL file=/pytorch/aten/c/THC/Thccachinghostallocator cpp Line=278 error=700: an illegal memory access was encountered
At first glance, both places are relatively simple, but they reported strange mistakes.
Solution:
The debug process found an exception
In the data data output by the pytorch network, the variable does not display the specific network output value, but the address information of the data
T:torch.Tensor object at 0x7fb27e7c8f30
data:torch.Tensor object at 0x7fb27e7c8f30
Later, it was found that it was because of self The linear layer is’ CPU ‘, while other networks are on’ CUDA ‘, which is equivalent to the inconsistency caused by the forward propagation of’ CUDA ‘type data to the’ CPU ‘network. Just transfer the network to’ CUDA ‘.