torch.autograd.backward
torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None)
retain_ graph (bool, optional) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way.Defaults to the value of create_ graph.create_ graph (bool, optional) – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.Defaults to False.
retain_ graph = True (when to use it?)
retain_ Graph is a parameter that we can’t use in ordinary times, but we will use it in special cases

 when there are two outputs in a network that need to be backwarded respectively: output1. Backward(), output2. Backward(). When there are two losses in a network that need to be backwarded respectively: loss1. Backward(), loss1. Backward(). </ OL> when there are two outputs in a network that need to be backwarded respectively
Take case 2. For example,
if the code is written like this, the parameter at the beginning of the blog will appear:
loss1.backward()
loss2.backward()
Correct code:
loss1.backward(retain_graph=True) #Keep the intermediate arguments after backward.
loss2.backward() # All intermediate variables are freed for the next loop
optimizer.step() # update parameters
retain_ The graph parameter is true to keep the intermediate parameter, so that the backward() of two loss will not affect each other.
Supplement: when two losses of two networks need to be backwarded respectively for backhaul: loss1. Backward(), loss1. Backward()
#The case of two networks requires defining separate optimizers for each of the two networks
optimizer1= torch.optim.SGD(net1.parameters(), learning_rate, momentum,weight_decay)
optimizer2= torch.optim.SGD(net2.parameters(), learning_rate, momentum,weight_decay)
.....
#train Part of the loss return processing
loss1 = loss()
loss2 = loss()
optimizer1.zero_grad() #set the grade to zero
loss1.backward(retain_graph=True) #Keep the intermediate parameters after backward.
optimizer1.step()
optimizer2.zero_grad() #set the grade to zero
loss2.backward()
optimizer2.step()
scheduler = torch.optim.lr_ Scheduler. Steplr (
appendix:
Step explanation
optimizer.zero_ grad()
Initialize the gradient to zero
(because the derivative of loss of a batch with respect to weight is the sum of the derivative of loss with respect to weight of all samples)
corresponding to d_ weights = [0] * n
output = net(inputs)
The predicted value is obtained by forward propagation
loss = Loss(outputs, labels)
Ask for loss
loss.backward()
Back propagation for gradient
corresponding D_ weights = [d_ weights[j] + (label[k] – output ) * input[k][j] for j in range(n)]
optimizer.step()
Update all parameters
corresponding weights = [weights [k] + alpha * D_ weights[k] for k in range(n)]
Read More:
 [Solved] RuntimeError: Expected object of scalar type Float but got scalar type Double for argument #2 ‘mat1‘
 Keras saves save() and save in the model_ weights()
 RuntimeError: Expected hidden[0] size (x, x, x), got(x, x, x)
 Deep learning: derivation of sigmoid function and loss function
 Mxnet general optimizer usage
 RuntimeError: Found dtype Double but expected Float”
 TypeError(‘Keyword argument not understood:‘, ‘***‘) in keras.models load_model
 Problem: attributeerror: ‘tensor’ object has no attribute ‘creator’
 AttributeError: ‘Tensor‘ object has no attribute ‘_numpy‘
 To solve the problem of increasing video memory when training network (torch)
 sklearn.metrics.mean_squared_error
 CheXNetmaster: CUDA out of memery [How to Solve]
 Copying a param with shape torch. Size ([262, 2048]), parameter size does not match
 InternalError: Failed to create session. Error and solution
 CUBLAS_STATUS_ALLOC_FAILED
 RuntimeError: reciprocal is not implemented for type torch.cuda.LongTensor
 PyTorch – AttributeError: ‘bool‘ object has no attribute ‘sum‘
 Attributeerror: ‘STR’ object has no attribute ‘decode’ solution: the pro test is successful.
 ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256,
 How to use torch.sum()