pytorch DDP Accelerate Error: [W reducer.cpp:362] Warning: Grad strides do not match bucket view strides.

When pytorch uses DDP to accelerate, the prompt message is:

Plain text
Copy to clipboard
Open code in new window
EnlighterJS 3 Syntax Highlighter
[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed. This is not an error, but may impair performance.
[W reducer.cpp:362] Warning: Grad strides do not match bucket view strides. This may indicate grad was not created according to the gradient layout contract, or that the param's strides changed since DDP was constructed.  This is not an error, but may impair performance.

The reason is that the internal memory address of the input tensor becomes contiguous after it has been transformed by transpose or permute.

It is simple to add a statement .contiguous() after the tensor has been transposed or permuted to make the memory address contiguous.

For example:

# error codes:
input_tensor = ori_tensor.transpose(1, 3)

# Modified codes:
input_tensor = ori_tensor.transpose(1, 3).contiguous()

Read More: