[Solved] RuntimeError: Trying to backward through the graph a second time…

RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

torch.autograd.backward(tensors, grad_tensors=None, retain_graph=None, create_graph=False, grad_variables=None)

retain_ graph (bool, optional) – If False, the graph used to compute the grad will be freed. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way.Defaults to the value of create_ graph.create_ graph (bool, optional) – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products.Defaults to False.

retain_ graph = True (when to use it?)

retain_ Graph is a parameter that we can’t use in ordinary times, but we will use it in special cases

    1. when there are two outputs in a network that need to be backwarded respectively: output1. Backward(), output2. Backward(). When there are two losses in a network that need to be backwarded respectively: loss1. Backward(), loss1. Backward(). </ OL> when there are two outputs in a network that need to be backwarded respectively

Take case 2. For example,
if the code is written like this, the parameter at the beginning of the blog will appear:


Correct code:

loss1.backward(retain_graph=True) #Keep the intermediate arguments after backward.
loss2.backward() # All intermediate variables are freed for the next loop
optimizer.step() # update parameters

retain_ The graph parameter is true to keep the intermediate parameter, so that the backward() of two loss will not affect each other.

Supplement: when two losses of two networks need to be backwarded respectively for backhaul: loss1. Backward(), loss1. Backward()

#The case of two networks requires defining separate optimizers for each of the two networks
optimizer1= torch.optim.SGD(net1.parameters(), learning_rate, momentum,weight_decay)
optimizer2= torch.optim.SGD(net2.parameters(), learning_rate, momentum,weight_decay)
#train Part of the loss return processing
loss1 = loss()
loss2 = loss()

optimizer1.zero_grad() #set the grade to zero
loss1.backward(retain_graph=True) #Keep the intermediate parameters after backward.

optimizer2.zero_grad() #set the grade to zero

scheduler = torch.optim.lr_ Scheduler. Steplr (

Step explanation

optimizer.zero_ grad()

Initialize the gradient to zero
(because the derivative of loss of a batch with respect to weight is the sum of the derivative of loss with respect to weight of all samples)
corresponding to d_ weights = [0] * n

output = net(inputs)

The predicted value is obtained by forward propagation

loss = Loss(outputs, labels)

Ask for loss


Back propagation for gradient
corresponding D_ weights = [d_ weights[j] + (label[k] – output ) * input[k][j] for j in range(n)]


Update all parameters
corresponding weights = [weights [k] + alpha * D_ weights[k] for k in range(n)]

Read More: