Tag Archives: Torch

[Solved] RuntimeError: “unfolded2d_copy“ not implemented for ‘Half‘

report errors

RuntimeError: "unfolded2d_copy" not implemented for 'Half'

reason

Parameters use_half=true passed in by the model, that is, the CPU is reasoned by using fp16 mixed precision calculation. Fp16 is used to speed up the speed, but the pytorch CPU does not support fp16,

Solution:

  1. Add use_half=False.
  2. Modify half() to float().

So that the model can be calculated;

Modification of my error report:


I hope this article is useful to you!

Thank you for your comments!

[Solved] RuntimeError: DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

Question

RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:76] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

Today, when running yoov7 on my own computer, I used the CPU to run the test model because I didn’t have a GPU. I used the CPU to predict an independent image. There is no problem running an image. It is very nice!!! However, when I predict a video (multiple images), he told me that the memory allocation was insufficient,

DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.,

Moreover, it does not appear after the second image is run. It appears when the 17th image is calculated. The memory can not be released several times later~~~~~~~~

analysis

In pytorch, a tensor has a requires_grad parameter, which, if set to True, is automatically derived when backpropagating the tensor. tensor’s requires_grad property defaults to False, and if a node (leaf variable: tensor created by itself) requires_grad is set to True, then all nodes that depend on it require_grad to be True (even if other dependent tensors have requires_grad = False). grad is set to True, then all the nodes that depend on it will have True (even if the other tensor’s requires_grad = False)


Note:

requires_grad is a property of the generic data structure Tensor in Pytorch, which is used to indicate whether the current quantity needs to retain the corresponding gradient information in the calculation. Taking linear regression as an example, it is easy to know that the weights w and deviations b are the objects to be trained, and in order to get the most suitable parameter values, we need to set a relevant loss function, based on the idea of gradient back propagation Perform training.

When requires_grad is set to False, the backpropagation is not automatically derivative, so it saves memory or video memory.

Then the solution to this problem follows, just let the model not record the gradient during the test, because it is not really used.

 

Solution:

Use with torch.no_grad(), let the model not save the gradient during the test:

with torch.no_grad():
    output, _ = model(image) # Add before the image calculation

In this way, when the model calculates each image, the derivative will not be obtained and the gradient will not be saved!

Perfect solution!

[ncclUnhandledCudaError] unhandled cuda error, NCCL version xx.x.x

Problem description

Problems encountered during distributed training

RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:47, unhandled cuda error, NCCL version 21.0.3
ncclUnhandledCudaError: Call to CUDA function failed.

The specific errors are as follows:

 

Problem-solving

According to the analysis of error reporting information, an error is reported during initialization during distributed training, not during training. Therefore, the problem is located on the initialization of distributed training.

Enter the following command to check the card of the current server

nvidia-smi -L

The first card found is 3070

GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 1: NVIDIA GeForce RTX 3070 (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 4: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 5: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 6: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 7: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)

Therefore, here I directly try to use 2-7 cards for training.

Correct solution!

Pytorch Loading model error: RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict

When the model is saved, it is saved with key pairs. At the same time, when loading, find the key value corresponding to the model according to the key value of the current network, and then load it. Generally, an error is reported because the key values of the model and the network do not match.

1. The most common problem is that there are too many key values or less module

In this case, the key value saved by the model after dataparallel or DDP training has module  , The key value of the corresponding network has no module

1) You can:

model = nn.DataParallel(model)

Add the key value of the model to module

2) You can also modify the key value by traversing the key pair value of the model.

For example, delete redundant modules when loading models   The code is as follows

state_dict = torch.load(load_path)
for key, param in state_dict.items():
    if key.startswith('module.'):        
        state_dict[key[7:]] = param          
        state_dict.pop(key)
net.load_state_dict(state_dict)
        

2. Explain load in detail_state_False parameter of dict (state_dict, false)

Many tutorials say that if the names do not match, you can directly add the false parameter, but you need to pay attention to a big pit here.

If the key value of the model does not match the key value of the network, the model will not load the pre training parameters, although no error will be reported.

The false parameter is used to the non-strict matching loading model can be analyzed in the following cases.

1) The model contains some parameters of the network

For example, the model is resnet101, and your current network is resnet50. Assuming that the parameter name of resnet50 is included in the parameters of resnet101, using false directly will load parameters with the same key value for your network resnet50. This avoids circular matching of each key pair value of resnet101 to see if it is required by resnet50.

2) The model does not contain the parameters of the network at all

As shown in case 1, the model has 100 parameters, all of which contain ‘module.’, and the network also has 100 parameters, all of which do not have ‘module.’. In this case, if the parameter is set to false, it will be found that no key-value can match, so the network will not load any parameters.

3) Introduce another false usage scenario

For example, in the distillation network pisr, the teacher network includes encoder and decoder, and the student network is composed of decoder. Therefore, when training the student network, if you want to load the pre-training model saved by the teacher network, setting false will automatically identify that the key values of the decoder are the same, and then load it.

To sum up, after setting the false parameter, the parameters are still loaded according to the key value. How many key values match, how many model parameters are loaded.

3. As long as the parameter size is the same, it can be loaded

For example, I have a 10 layer network model and a 3-layer network. I want to load the parameters of layer 9 into layer 1 of the current network. If the parameters have the same size, you can traverse the key pair values. Load the parameter into the desired key value.

state_dict = torch.load(load_path)
new_state_dict = []
for key, param in state_dict.items():
    if 'conv9' in key:        
        new_state_dict[key.replace('conv9', 'conv1')] = param   
net.load_state_dict(new_state_dict)

CUBLAS_STATUS_ALLOC_FAILED

 

CUBLAS_ STATUS_ ALLOC_ FAILED

 

resolvent:

Extract binding CUDA

torch.rand (1, 3, 10, 10).cuda(7)

It’s OK in training, but not in prediction.

 

 

>>> input = torch.rand(1, 3, 10, 10)
>>> kh, kw = 3, 3
>>> weight = torch.rand(5, 3, kh, kw)
>>> # offset should have the same spatial size as the output
>>> # of the convolution. In this case, for an input of 10, stride of 1
>>> # and kernel size of 3, without padding, the output size is 8
>>> offset = torch.rand(5, 2 * kh * kw, 8, 8)
>>> out = deform_conv2d(input, offset, weight)

 

Usage of Python dropout

link: https://www.zhihu.com/question/67209417/answer/302434279
Just stepped on the pit, almost cried out TT. — I clearly added a hundred dropout, why the results have not changed
When using F.dropout (nn. Functional. Dropout), it is necessary to set the state parameter of training consistent with the model as a whole.
Such as:

 
    Class DropoutFC(nn.Module): def: (self): super(DropoutFC, self). input): out = self.fc(input) out = F.dropout(out, P =0.5) return out Net = DropoutFC() Net. Train () # train the Net

The f.d.ropout in this code is actually useless because its training state is always the default False. Since F.ropout is only equivalent to an external function referenced, changes in the training status of the whole model will not cause changes in the training status of the function f.ropout. So, here out = F.d ropout (out) is out = out. Ref: https://github.com/pytorch/pytorch/blob/master/torch/nn/functional.py#L535
 
The correct way to use it is to pass the training status parameters of the model into the Dropout function

 
    Class DropoutFC(nn.Module): def: (self): super(DropoutFC, self). Input): out = self.fc(input) out = f.darpout (out, p=0.5, Training =self. Training) return out Net = DropoutFC() Net. Train () # train the Net

 
Or directly using nn. Dropout () (nn) Dropout () is actually the F.d ropout a packing, will also self. Training incoming) Ref: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/dropout.py#L46

 
    Class DropoutFC(nn. Module): def __init__: super(DropoutFC, self).__init__() self.fc = nn. Linear(100.20) self.dropout = nn. Dropout(p=0.5) def forward(self, input): out = self.fc(input) out = self.dropout(out) return out Net = DropoutFC() Net.train()