When pytorch uses DDP to accelerate, the prompt message is:
[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
The reason is that the internal memory address of the input tensor becomes contiguous after it has been transformed by transpose or permute.
It is simple to add a statement .contiguous() after the tensor has been transposed or permuted to make the memory address contiguous.
For example:
# error codes:
input_tensor = ori_tensor.transpose(1, 3)
# Modified codes:
input_tensor = ori_tensor.transpose(1, 3).contiguous()