When pytorch uses DDP to accelerate, the prompt message is:
[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
The reason is that the internal memory address of the input tensor becomes contiguous after it has been transformed by transpose or permute.
It is simple to add a statement .contiguous() after the tensor has been transposed or permuted to make the memory address contiguous.
For example:
# error codes:
input_tensor = ori_tensor.transpose(1, 3)
# Modified codes:
input_tensor = ori_tensor.transpose(1, 3).contiguous()
Read More:
- Here is the difference and connection of Torch. View (), Transpose (), and Permute ()
- [Solved] RuntimeError: function ALSQPlusBackward returned a gradient different than None at position 3, but t
- Autograd error in Python: runtimeerror: grad can be implicitly created only for scalar outputs
- Python: RNN principle realized by numpy
- Error:output with shape [1, 224, 224] doesn‘t match the broadcast shape [3, 224, 224]
- RuntimeError: CUDA error: an illegal memory access was encountered
- [Solved] Pytorch-transformers Error: AttributeError: ‘str‘ object has no attribute ‘shape‘
- Pytorch directly creates a tensor on the GPU error [How to Solve]
- Python custom convolution kernel weight parameters
- Python: Torch.nn.functional.normalize() Function
- torch.nn.functional.normalize() Function Interpretation
- [Solved] Pytorch error: RuntimeError: one of the variables needed for gradient computation
- [Solved] ValueError: only one element tensors can be converted to Python scalars
- pytorch: RuntimeError CUDA error device-side assert triggered
- How to Solve Error: RuntimeError CUDA out of memory
- [Solved] RuntimeError: cublas runtime error : resource allocation failed at
- [Solved] RuntimeError: Numpy is not available (Associated Torch or Tensorflow)
- Pytorch: error message with chunks of 0 [How to Solve]
- Pytorch ValueError: Expected more than 1 value per channel when training, got input size [1, 768
- [Solved] DDP/DistributedDataParallel Error: RuntimeError: Address already in use