Tag Archives: pytorch

An error is reported in the requirements code of the generated project

When you need to generate the package and corresponding version required by a project, you can first CD it to the project directory, and then enter:

pipreqs ./

Code error is reported as follows:

Traceback (most recent call last):
  File "f:\users\asus\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "f:\users\asus\anaconda3\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "F:\Users\asus\Anaconda3\Scripts\pipreqs.exe\__main__.py", line 7, in <module>
  File "f:\users\asus\anaconda3\lib\site-packages\pipreqs\pipreqs.py", line 470, in main
    init(args)
  File "f:\users\asus\anaconda3\lib\site-packages\pipreqs\pipreqs.py", line 409, in init
    follow_links=follow_links)
  File "f:\users\asus\anaconda3\lib\site-packages\pipreqs\pipreqs.py", line 122, in get_all_imports
    contents = f.read()
  File "f:\users\asus\anaconda3\lib\codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xbb in position 570: invalid start byte

Find line 122 of pipreqs.py and modify the coding format to iso-8859-1.

with open_func(file_name, "r", encoding='ISO-8859-1') as f:
	contents = f.read()

After trying many encoding formats, such as GBK, GB18030, etc., errors are still reported until iso-8859-1 is used. The specific reason is that the parameter setting of decode is too strict. It is set to igonre, but it is not found where the decode function is changed. Change it when you find it later.

[Solved] Pytorch Download CIFAR1 Datas Error: urllib.error.URLError: ＜urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certi

urllib.error.URLError: < urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certi

Solution:

Add the following two lines of code before the code starts:

import ssl
ssl._create_default_https_context = ssl._create_unverified_context

Complete example:

import torch
import torchvision
import torchvision.transforms as transforms
import ssl
ssl._create_default_https_context = ssl._create_unverified_context
#Download the data set and adjust the image, because the output of the torchvision data set is in PILImage format, and the data field is in [0,1]
#We convert it into the tensor format of the standard data field [-1,1]
#transform Data Converter
transform=transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,0.5,0.5),(0.5,0.5,0.5))])
trainset=torchvision.datasets.CIFAR10(root='./data',train=True,download=True,transform=transform)
# The downloaded data is placed in the trainset
trainloader=torch.utils.data.DataLoader(trainset,batch_size=4,shuffle=True,num_workers=2)
# DataLoader Data Iterator Encapsulate data into DataLoader
# num_workers: Two threads read data
# batch_size=4 batch processing

testset=torchvision.datasets.CIFAR10(root='./data',train=False,download=True,transform=transform)
testloader=torch.utils.data.DataLoader(testset,batch_size=4,shuffle=False,num_workers=2)
classes=('airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Download result

[Solved] RuntimeError: function ALSQPlusBackward returned a gradient different than None at position 3, but t

class ALSQPlus(Function):
    @staticmethod
    def forward(ctx, weight, alpha, g, Qn, Qp, per_channel, beta):
        # assert alpha > 0, "alpha={}".format(alpha)
        ctx.save_for_backward(weight, alpha, beta)
        ctx.other = g, Qn, Qp, per_channel
        if per_channel:
            sizes = weight.size()
            weight = weight.contiguous().view(weight.size()[0], -1)
            weight = torch.transpose(weight, 0, 1)
            alpha = torch.broadcast_to(alpha, weight.size())
            beta = torch.broadcast_to(beta, weight.size())
            w_q = Round.apply(torch.div((weight - beta), alpha)).clamp(Qn, Qp)
            w_q = w_q * alpha + beta
            w_q = torch.transpose(w_q, 0, 1)
            w_q = w_q.contiguous().view(sizes)
        else:
            w_q = Round.apply(torch.div((weight - beta), alpha)).clamp(Qn, Qp)
            w_q = w_q * alpha + beta
        return w_q

    @staticmethod
    def backward(ctx, grad_weight):
        weight, alpha, beta = ctx.saved_tensors
        g, Qn, Qp, per_channel = ctx.other
        if per_channel:
            sizes = weight.size()
            weight = weight.contiguous().view(weight.size()[0], -1)
            weight = torch.transpose(weight, 0, 1)
            alpha = torch.broadcast_to(alpha, weight.size())
            q_w = (weight - beta)/alpha
            q_w = torch.transpose(q_w, 0, 1)
            q_w = q_w.contiguous().view(sizes)
        else:
            q_w = (weight - beta)/alpha
        smaller = (q_w < Qn).float() #bool value to floating point value, 1.0 or 0.0
         bigger = (q_w > Qp).float() #bool value to floating point value, 1.0 or 0.0
         between = 1.0-smaller -bigger #Get the index in the quantization interval
        if per_channel:
            grad_alpha = ((smaller * Qn + bigger * Qp + 
                between * Round.apply(q_w) - between * q_w)*grad_weight * g)
            grad_alpha = grad_alpha.contiguous().view(grad_alpha.size()[0], -1).sum(dim=1)
            grad_beta = ((smaller + bigger) * grad_weight * g).sum().unsqueeze(dim=0)
            grad_beta = grad_beta.contiguous().view(grad_beta.size()[0], -1).sum(dim=1)
        else:
            grad_alpha = ((smaller * Qn + bigger * Qp + 
                between * Round.apply(q_w) - between * q_w)*grad_weight * g).sum().unsqueeze(dim=0)
            grad_beta = ((smaller + bigger) * grad_weight * g).sum().unsqueeze(dim=0)
        grad_weight = between * grad_weight
        #The returned gradient should correspond to the forward parameter
        return grad_weight, grad_alpha, grad_beta, None, None, None, None

RuntimeError: function ALSQPlusBackward returned a gradient different than None at position 3, but the corresponding forward input was not a Variable

The gradient return value of the backward function of Function should be consistent with the order of the parameters of forward

Modify the last line to return grad_weight, grad_alpha, None, None, None, None, grad_beta

Error:output with shape [1, 224, 224] doesn‘t match the broadcast shape [3, 224, 224]

Error: output with shape [1, 224, 224] don’t match the broadcast shape [3, 224, 224]
the image input by the original model is RGB three channel, and the input is single channel gray image.

# Error:output with shape [1, 224, 224] doesn't match the broadcast shape [3, 224, 224]
# The input image of the original model is RGB three-channel, and the one I input is a single-channel grayscale image.
# #------------------------------------------------ --------------------------------------
# from torch.utils.data import DataLoader
# dataloader = DataLoader(dataset, shuffle=True, batch_size=16)
# from torchvision.utils import make_grid, save_image
# dataiter = iter(dataloader)
# img = make_grid(next(dataiter)[0], 4) # Assemble a 4*4 grid image and convert it into 3 channels
# to_img(img)
# #-------------------------------------------------------------------------------------
# It seems that make_grid cannot be converted to 3 channels

The solution is as follows:

from torch import nn
from torchvision import datasets
from torchvision import transforms as T
from torch.utils.data import DataLoader
from torchvision.utils import make_grid, save_image
import numpy as np
import matplotlib.pyplot as plt

transform  = T.Compose([
         T.ToTensor(), #This will convert a numpy array between 0 and 255 into a floating point tensor between 0 and 1
          T.Normalize((0.5, ), (0.5, )), #In the normalize() method, we specify the mean of all channels of the normalized tensor image, and also specify the central deviation.
])
dataset = datasets.MNIST('data/', download=True, train=False, transform=transform)
dataloader = DataLoader(dataset, shuffle=True, batch_size=100)

print(type(dataset[0][0]),dataset[0][0].size())
# print(dataset[0][0])
# To draw a tensor image, we must change it back to a numpy array.
# We will do this in the function def im_convert(), which contains a parameter that will become a tensor image.
def im_convert(tensor):
    image=tensor.clone().detach().numpy()
    # The new tensor obtained using torch.clone() and the original data no longer share memory, but still remain in the calculation graph,
    # The clone operation supports gradient transfer and superposition without sharing data memory, so it is commonly used in scenarios where a unit in a neural network needs to be reused.
    # Usually if the requirements_grad of the original tensor=True, then:
    # tensor requires_grad=True after clone() operation
    # The tensor requires_grad=False after the detach() operation.
    image=image.transpose(1, 2, 0)
    # The tensor to be converted to a numpy array has the shape of the first, second and third dimensions. The first dimension represents the color channel, and the second and third dimensions represent the width and height of the image and pixels.
    # We know that each image in the MNIST dataset is a grayscale corresponding to a single color channel, and its width and height are 28 * 28 pixels. Therefore, the shape will be (1, 28, 28).
    # In order to draw an image, the shape of the image is required to be (28, 28, 1). Therefore, by swapping axis zero, one and two
    print(image.shape)
    image=image*(np.array((0.5, 0.5, 0.5))+np.array((0.5, 0.5, 0.5)))
    print(image.shape)
    # We normalize the image, and before we have to normalize it. Normalization is done by subtracting the average value and dividing by the standard deviation.
    # We will multiply by the standard deviation, and then add the average
    image=image.clip(0, 1)
    print(image.shape,type(image))
    return image
    # To ensure the range between 0 and 1, we use clip()
    # Function and passed zero and one as parameters. We apply the clip function to the minimum value 0 and maximum value 1 and return the image.

# It will create an object that allows us to pass through a variable training loader at a time.
# We access one element at a time by calling next on the dataiter.
# next() function will get our first batch of training data, and the training data will be divided into the following images and labels
dataiter=iter(dataloader)
images, labels=dataiter.next()

fig=plt.figure(figsize=(25, 6))
#fig=plt.figure(figsize=(25, 4)) #Picture output width is smaller than above
for idx in np.arange(20):
    ax=fig.add_subplot(2, 10, idx+1)
    plt.imshow(im_convert(images[idx]))
    ax.set_title([labels[idx].item()])
plt.show()

The final results are as follows:

RuntimeError: Non RGB images are not supported [How to Fix]

Another version problem… I ran into it all. The version of
torchvision is too low, io.read_ Image does not support grayscale images. You can only read three channel color images…

segmentation = mask_to_rle(torchvision.io.read_image(os.path.join(self.root, m['mask']))[0] == 255)

current version

torchvision==0.8.2

Upgraded version

torchvision==0.9.0

Problem solved, yep! Of course, it is possible to solve the problem without upgrading torchvision. You can first convert the gray image into a three channel image, and you can try
(it’s also a reminder. Pay attention to the matching of torch versions)

Apex install error: the environment is not compatible

Most of the errors reported in apex installation are mainly due to the unsuitable environment. For example, CUDA version is not suitable for torch. Please check the required CUDA version before installation:

# cuda version in pytorch
import torch
torch.version.cuda

[pl.LightningModule] spaCy & pytorch-lightning Error

In pl.lightningmodule, Spacy cannot be used for word segmentation, or an error will be reported

1. Use in the forward process

...
File "spacy/pipeline/trainable_pipe.pyx", line 75, in spacy.pipeline.trainable_pipe.TrainablePipe.pipe
...

It is possible that all objects in the model are automatically transformed into trainable objects within the PL framework. Similarly, if the original pipe is also transformed into trainablepipe, an error will be reported, including an error as shown above

2. Avoid problem 1 and use nlp.pipe

Similarly, the same problem as in forward will be converted to a trainable pipe

3. To avoid problem 1, write Spacy processing outside the model as a function call

The same error will be reported. The error is different from the above. It is a very inexplicable error

Solution:

I didn’t find a good solution, so I had to rewrite the required functions manually, such as stopping words

[Pytorch Error Solution] Pytorch distributed RuntimeError: Address already in use

The errors reported by pytoch are as follows:

Pytorch distributed RuntimeError: Address already in use

reason:

The port is occupied during model multi card training. Just change the port.

Solution:

Add a parameter — master before running the command_ For example:

 --master_port 29501

The following parameter 29501 can be set to any other port

be careful:

This parameter should be loaded in front of xxx.py, for example:

CUDA_VISIBLE_DEVICES=2,7 python3 -m torch.distributed.run /
--nproc_per_node 2  --master_port 29501  train.py

[Solved] AttributeError: ‘_IncompatibleKeys’ object has no attribute

The error code is as follows:

# Error code!
model = model.load_state_dict(state_dict_var)
out = model(input)

There are some problems on the Internet because the saved model uses state_ DCIT() to load, or vice versa.

The error here is different from the above because xxx.load_state_Dict() does not require a return value to receive.

Amend to read as follows:

# Correct code!
model.load_state_dict(state_dict_var)
out = model(input)

I hope it will help more people.

RuntimeError: cuda runtime error (801) : operation not supported at ..

error message

THCudaCheck FAIL file=..\torch/csrc/THCudaChgeneck FAeric/StoragITHL Cuefile=Sha.da.\rCinhtoregcch.k cpp /csrcline=24F/gAI9 errL file=oe..\tor=nr801 : eoric/Stopch/ceration not susrcpporte/gend
eric/StorageSharing.cpp rageSharing.cpp line=249 error=801 : operation not supported
line=249 error=801 : operation not supported
Traceback (most recent call last):
Traceback (most recent call last):
  File "D:\Miniconda3\envs\dl\lib\multiprocessing\queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "D:\Miniconda3\envs\dl\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
  File "D:\Miniconda3\envs\dl\lib\multiprocessing\queues.py", line 236, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "D:\Miniconda3\envs\dl\lib\site-packages\torch\multiprocessing\reductions.py", line 247, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
  File "D:\Miniconda3\envs\dl\lib\multiprocessing\reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
Traceback (most recent call last):
  File "D:\Miniconda3\envs\dl\lib\site-packages\torch\multiprocessing\reductions.py", line 247, in reduce_tensor
    event_sync_required) = storage._share_cuda_()
RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:249
RuntimeError: cuda runtime error (801) : operation not supported at ..\torch/csrc/generic/StorageSharing.cpp:249

reason

https://github.com/fastai/fastbook/issues/85

Since pytorch multiprocessing does not work on windows, set the num of dataloaders_ workers=0

[Solved] pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle

Invalid resource handle error in pycuda code

When running CUDA code, the following error occurs:

File "/mnt/lustre/demo/extract_disp_newtopo/face_registration-master/code/poisson.py", line 147, in blend_gs_cuda
    block = (1024, 1, 1))
  File "/mnt/lustre/miniconda3/envs/pycuda/lib/python3.6/site-packages/pycuda/driver.py", line 436, in function_call
    func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle

Solution:
find the CUDA function part with the problem:

mod = SourceModule("""
			#include <stdint.h>
			__global__ void construct_b(
				const uint8_t* src, const int16_t* u, const int16_t* v,
				float* b,
				int pix_num, int size
			)

You can add any sentence in the following code before the function declaration

src = torch.cuda.ByteTensor(8)# Fill in the numbers at will, and the matrix is also fine, but the data type should be consistent with the data type in C++
b   = torch.cuda.FloatTensor(9)

The complete code is as follows:

src = torch.cuda.ByteTensor(8)
mod = SourceModule("""
			#include <stdint.h>
			__global__ void construct_b(
				const uint8_t* src, const int16_t* u, const int16_t* v,
				float* b,
				int pix_num, int size
			)

The specific reason is unknown because it occurs occasionally.

[Solved] Pytorch Install Error: Solving environment: failed with initial frozen solve. Retrying with flexible solve

The solution to this problem eventually returns to the problem of version synchronization

The versions at the two arrows should be consistent.

ProgrammerAH

Programmer Guide, Tips and Tutorial