Tag Archives: pytorch

urllib.error.URLError: ＜urlopen error [Errno -3] Temporary failure in name resolution＞

When training the model, load some pre training models, such as VGg. The code is as follows

model = torchvision.models.vgg19(pretrained=True)

Train will display

Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/checkpoints/vgg19-dcbb9e9d.pth

Then an error occurred:

socket.gaierror: [Errno -3] Temporary failure in name resolution
and
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

This is because the pre training model cannot be downloaded, so it needs to be downloaded from the Internet
therefore, it is more convenient to download the model first, find a way to connect to the Internet, and then input the link automatically https://download.pytorch.org/models/vgg19-dcbb9e9d.pth
Then put the downloaded. PTH model file under a fixed path, such as

/home/team/torch/models/pre_ model/vgg19-dcbb9e9d.pth

Finally, change the code to

model = torchvision.models.vgg19(pretrained=False)
pthfile = r'/home/team/torch/models/pre_model/vgg19-dcbb9e9d.pth'
model.load_state_dict(torch.load(pthfile))```

[Solved] Runtimeerror during dcgan training: found dtype long but expected float

When using dcgan for network training, the following errors occur:

RuntimeError: Found dtype Long but expected Float

The code snippet for this error is as follows:

label = torch.full((b_size,), real_label, device=device)
        # Input the batch with positive samples into the discriminant network for forward computation and put the result into the variable output
        output = netD(real_cpu).view(-1)
    
        # Calculate the loss
        errD_real = criterion(output, label)

The reason is that the data type of the input output data and tag value into the loss function does not match the required data type. What is required is float type data, and what is passed in is long type data
therefore, we need to convert the incoming data to float type
the modified code is as follows:

label = torch.full((b_size,), real_label, device=device)
        # Input the batch with positive samples into the discriminant network for forward computation and put the result into the variable output
        output = netD(real_cpu).view(-1)
        # Convert the incoming data to float type
        output = output.to(torch.float32)
        label = label.to(torch.float32)
        # Calculate the loss
        errD_real = criterion(output, label)

Problem solved!

Error when downloading the built-in dataset of pytoch = urllib.error.urlerror: urlopen error [SSL: certificate_verify_failed]

Error reason:

This is an SSL certificate validation error. When an HTTPS site is requested, but the certificate validation error occurs, such an error will be reported.

Solution:

Just add the following two lines to the code to skip the certificate check and successfully access the web page.

# Global removal of certificate validation
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Preface solution

preface

Today, we use Yolo V5.6 training model and modify the batch size to 32. The following error occurred:

Starting training for 100 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
  0%|                                                                                                                                                                         | 0/483 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 620, in <module>
    main(opt)
  File "train.py", line 517, in main
    train(opt.hyp, opt, device, callbacks)
  File "train.py", line 315, in train
    pred = model(imgs)  # forward
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\liufq\yolov5-6.0\models\yolo.py", line 126, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "D:\liufq\yolov5-6.0\models\yolo.py", line 149, in _forward_once
    x = m(x)  # run
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\liufq\yolov5-6.0\models\common.py", line 137, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\liufq\yolov5-6.0\models\common.py", line 45, in forward
    return self.act(self.bn(self.conv(x)))
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\conv.py", line 440, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

terms of settlement

Modify the size of batchsize and make it smaller.

[MMCV]RuntimeError: CUDA error: no kernel image is available for execution on the device

There are two reasons for this problem
first, the GPU computing power and the python version do not match
Second, the server uses a combination of graphics cards with different computing power
on the first point, pytorch no longer supports graphics cards with computing power less than 3.7 after 1.3.0. You can reinstall the lower version of pytorch. The corresponding version can be found in the following link:
torch, torchvision historical version download
common graphics card computing power is as follows

GPU	Compute Capability
NVIDIA TITAN RTX	7.5
Geforce RTX 2080 Ti	7.5
Geforce RTX 2080	7.5
Geforce RTX 2070	7.5
Geforce RTX 2060	7.5
NVIDIA TITAN V	7.0
NVIDIA TITAN Xp	6.1
NVIDIA TITAN X	6.1
GeForce GTX 1080 Ti	6.1
GeForce GTX 1080	6.1
GeForce GTX 1070	6.1
GeForce GTX 1060	6.1
GeForce GTX 1050	6.1
GeForce GTX TITAN X	5.2
GeForce GTX TITAN Z	3.5
GeForce GTX TITAN Black	3.5
GeForce GTX TITAN	3.5
GeForce GTX 980 Ti	5.2
GeForce GTX 980	5.2
GeForce GTX 970	5.2
GeForce GTX 960	5.2
GeForce GTX 950	5.2
GeForce GTX 780 Ti	3.5
GeForce GTX 780	3.5
GeForce GTX 770	3.0
GeForce GTX 760	3.0
GeForce GTX 750 Ti	5.0
GeForce GTX 750	5.0
GeForce GTX 690	3.0
GeForce GTX 680	3.0
GeForce GTX 670	3.0
GeForce GTX 660 Ti	3.0
GeForce GTX 660	3.0
GeForce GTX 650 Ti BOOST	3.0
GeForce GTX 650 Ti	3.0
GeForce GTX 650	3.0
GeForce GTX 560 Ti	2.1
GeForce GTX 550 Ti	2.1
GeForce GTX 460	2.1
GeForce GTS 450	2.1
GeForce GTS 450*	2.1
GeForce GTX 590	2.0
GeForce GTX 580	2.0
GeForce GTX 570	2.0
GeForce GTX 480	2.0
GeForce GTX 470	2.0
GeForce GTX 465	2.0
GeForce GT 740	3.0
GeForce GT 730	3.5
GeForce GT 730 DDR3,128bit	2.1
GeForce GT 720	3.5
GeForce GT 705*	3.5
GeForce GT 640 (GDDR5)	3.5
GeForce GT 640 (GDDR3)	2.1
GeForce GT 630	2.1
GeForce GT 620	2.1
GeForce GT 610	2.1
GeForce GT 520	2.1
GeForce GT 440	2.1
GeForce GT 440*	2.1
GeForce GT 430	2.1
GeForce GT 430*	2.1
GPU	Compute Capability
Tesla K80	3.7
Tesla K40	3.5
Tesla K20	3.5
Tesla C2075	2.0
Tesla C2050/C2070	2.0

On the second point, if you make an error in the mmcv framework, recompile mmcv according to the computing power of your graphics card. Take two graphics cards with computing power of 6.1 and 7.5 as examples to compile. The commands are as follows:

TORCH_CUDA_ARCH_LIST="6.1;7.5"   pip install mmcv-full == {mmcv_version} -f   	https://download.openmmlab.com/mmcv/dist/{cuda version}/{pytorch version}/index.html

Among them, CUDA version and pytorch version are replaced by your version, such as cud101, torch 1.7.0
for specific corresponding information, please refer to GitHub of mmcv

RuntimeError: Found dtype Double but expected Float”

I made a mistake in finding the loss function,

resolvent:

target.float()

a=np.array([[1,2],[3,4]])
b=np.array([[2,3],[4,4]])

loss_fn = torch.nn.MSELoss(reduce=True, size_average=True)

input = torch.autograd.Variable(torch.from_numpy(a))
target = torch.autograd.Variable(torch.from_numpy(b))

loss = loss_fn(input.float(), target.float())

print(loss)

Pytorch torch.cuda.FloatTensor Error: RuntimeError: one of the variables needed for gradient computation has…

pytorch 1.9 Error:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 512, 16, 16]], which is output 0 of ConstantPadNdBackward, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True). #23

At first I thought it was the input z = torch.randn(batch_size, 128,1,1).to(device).

Solution:
pip install torch == 1.4 torchvision = 0.05

Solve the runtimeerror in RNN: expected scalar type long but found float error

Project scenario:

Today, I saw the code of an RNN instance. I want to try to pass in RNN with my own data, but I can report an error.

Problem Description:

The error is runtimeerror: expected scalar type long but found float

Cause analysis:

The wrong input is as follows:

input=torch.tensor([ 0,  0,  0,  0,  0,  0,  0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
         0,  0,  0,  0,  0,  0])

The input in the example is as follows:

input=torch.tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0.]])

The reason is that I set dtype = torch.long when generating input

Solution:

Input = torch. Tensor (input, dtype = torch. Float)
specify dtype = torch. Float when generating the input tensor, and the resulting input will be of the following types

input=torch.tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,0.]])

If it meets the requirements, no error will be reported
I didn’t expect to solve this problem. I spent most of the night. I really haven’t started yet. I don’t have enough skills.

Python learning notes (5) — cross entropy error runtimeerror: 1D target tensor expected, multi target not supported

When I use cross entropy as the loss function, an error occurs:

RuntimeError: 1D target tensor expected, multi-target not supported

I checked the relevant information, and the statements in it are basically:

But it can’t solve my problem, because my tag data has been processed with the following code after processing:

torch.LongTensor(labels)

And I also printed the dimension of my label data:

torch.Size([16, 11])

Here 16 refers to batch_ Size , so it’s not a dimension problem.

But I was inspired when I read this blog (runtimeerror: multi target not supported at). It says:

When calculating the cross entropy loss function in pytorch, the correct label input cannot be in one hot format. The function will process itself into one hot format. Therefore, you do not need to enter [0 1], just enter 4.

My tag data is a multi tag problem, as follows:

tensor([0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0])

Then, when passing through loss , crossentropyloss will automatically code it as one-hot , which will increase it by one dimension to:

tensor([[1., 0.],
        [0., 1.],
        [1., 0.],
        [1., 0.],
        [0., 1.],
        [1., 0.],
        [1., 0.],
        [0., 1.],
        [0., 1.],
        [1., 0.],
        [1., 0.]])

This leads to the error.

Therefore, the solution is to use the loss function of the multi label problem. For example, multilabelsoftmarginloss , or the most original mselos .

reference resources

[1] Wang’s technical road. Runtimeerror: multi target not supported at [EB/OL]. (December 10, 2019) [October 27, 2021] https://www.cnblogs.com/blogwangwang/p/12018897.html
[2] Python free. Solution of “one-dimensional target tensor expectation, multi-objective unsupported” in cross entropy loss function, calculation, lossfunction, error report, 1dtargettensorexpected, multitargetnotsupported, Solution [EB/OL] (2020-07-04) [2021-10-27] https://www.pythonf.cn/read/125399

Sslcertverificationerror when downloading using Python: [SSL: certificate_verify_failed] error

When downloading the dataset on the pytorch official website, an error is reported in sslcertverificationerror: [SSL: certificate_verify_failed]. The following is my solution.

Error reason: when opening HTTPS link with urllib, SSL certificate will be checked once. When the target website uses a self signed certificate, it will throw the error of urlib2.urlerror.

Solution: cancel certificate validation globally
Import SSL
SSL_ create_ default_ https_ context = ssl._ create_ unverified_ context

Reference article link: https://blog.csdn.net/yixieling4397/article/details/79861379

pycuda._driver.Error:cuInit failed:unknown error

For example, in the pytorch project, it is encountered in autoinit.py

pycuda._ driver.Error:cuInit failed:unknown error

Solution: install NVIDIA modprobe package:

sudo apt-get install nvidia-modprobe

Normalize error: TypeError: Input tensor should be a float tensor…

The following error is reported when using tensor ` normalization

from torchvision import transforms
import numpy as np
import torchvision
import torch

data = np.random.randint(0, 255, size=12)
img = data.reshape(2,2,3)


print(img)
print("*"*100)
transform1 = transforms.Compose([
    transforms.ToTensor(), # range [0, 255] -> [0.0,1.0]
    transforms.Normalize(mean = (10,10,10), std = (1,1,1)),
    ]
)
# img = img.astype('float')
norm_img = transform1(img) 
print(norm_img)

You can add this sentence. In fact, it is to set the element type. See the tips above

ProgrammerAH

Programmer Guide, Tips and Tutorial