Tag Archives: pytorch

Python: CUDA error: an illegal memory access was accounted for

Error in pytorch1.6 training:

RuntimeError: CUDA error: an illegal memory access was encountered

The reason for the error is the same as that of the lower version of python (such as version 1.1)

Runtimeerror: expected object of backend CUDA but get backend CPU for argument https://blog.csdn.net/weixin_ 44414948/article/details/109783988

Cause of error:

The essence of this kind of error reporting is model and input data_ image、input_ Label) is not all moved to GPU (CUDA).
* * tips: * * when debugging, you must carefully check whether every input variable and network model have been moved to the GPU. I usually report an error because I have missed one or two of them.

resolvent:

Model, input_ image、input_ The example code is as follows:

model = model.cuda()
input_image = input_iamge.cuda()
input_label = input_label.cuda()

or

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
input_image = input_iamge.to(device)
input_label = input_label.to(device)

ValueError: Expected more than 1 value per channel when training, got input size torch.Size([1, 256,

When testing the trained network, python finds the above problems and adds them after loading the network model.eval (), the problem is solved.

model = nn.DataParallel(model).cpu()
model.load_state_dict(torch.load(path, map_location=torch.device('cpu')), False)
model.eval()

There are three things to note about this code snippet

 nn.DataParallel(model)

If you add this function to the training network, you also need to add this function to the training network

model.load_state_dict(torch.load(path, map_location=torch.device('cpu')), False)

If the network framework is not saved when the network is saved and only parameters are available, the network can be loaded by the above method. If the GPU is used when the network is trained and the parameter map is added on the CPU when the network is used_ location= torch.device (‘cpu’)

model.eval()

The third is the problem of the topic. After loading the network, add the above function to solve the problem

Resolve call‘ plt.show () ‘no image after

Call in code plt.show No img image is displayed after ()

import matplotlib.pyplot as plt
plt.imshow(img)

The solution is as follows:

First, add the following in the header file:

import pylab

Then in the original code plt.show (IMG) add the following:

pylab.show()

As shown in the figure, the picture can be displayed normally

Build your own resnet18 network and load torch vision’s own weight

import torch
import torchvision
import cv2 as cv
from utils.utils import letter_box
from model.backbone import ResNet18


model1 = ResNet18(1)
model2 = torchvision.models.resnet18(progress=False)
fc = model2.fc
model2.fc = torch.nn.Linear(512, 1)
# print(model)
model_dict1 = model1.state_dict()
model_dict2 = torch.load('resnet18.pth')
model_list1 = list(model_dict1.keys())
model_list2 = list(model_dict2.keys())
len1 = len(model_list1)
len2 = len(model_list2)
minlen = min(len1, len2)
for n in range(minlen):
    if model_dict1[model_list1[n]].shape != model_dict2[model_list2[n]].shape:
        continue
    model_dict1[model_list1[n]] = model_dict2[model_list2[n]]

model1.load_state_dict(model_dict1)
missing, unspected = model2.load_state_dict(model_dict2)
image = cv.imread('zhn1.jpg')
image = letter_box(image, 224)
image = image[:, :, ::-1].transpose(2, 0, 1)
print('Network loading complete.')
model1.eval()
model2.eval()
with torch.no_grad():
    image = torch.tensor(image/256, dtype=torch.float32).unsqueeze(0)
    predict1 = model1(image)
    predict2 = model2(image)
print('finished')
# torch.save(model.state_dict(), 'resnet18.pth')

RuntimeError: log_vml_cpu not implemented for ‘Long’

welcome to my blog
Problem description
Implements Torch. Log (tor.from_numpy (NP.array ([1,2,2]))))t implemented for ‘Long’
why
Long data does not support log operations. Why is a Tensor a Long?Since the numpy array is created without specifying a dtype, int64 is used by default, so when the numpy array is converted to torch.tensor, the data type becomes Long
The solution
Reset torch.log(torch.from_numpy(np.array([1,2,2],np.float)))

Python custom convolution kernel weight parameters

Pytorch build convolution layer generally use nn. Conv2d method, in some cases we need custom convolution kernels weight weight, and nn. Conv2d custom is not allowed in the convolution parameters, can use the torch at this time. The nn. Functional. Conv2d referred to as “f. onv2d

torch.nn.functional.conv2d(input, weight, bias=None, stride=1, padding=0, dilation=1, groups=1)

F.onv2d can and must be required to input the convolution weight and bias bias. Therefore, build the desired convolution kernel parameters, and then input F.conv2d. Here is an example of using f.conv2d to build the convolution layer, where a class is needed for the network model:

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.weight = nn.Parameter(torch.randn(16, 1, 5, 5))  # Customized weights
        self.bias = nn.Parameter(torch.randn(16))    # Customized bias

    def forward(self, x):
        x = x.view(x.size(0), -1)
        out = F.conv2d(x, self.weight, self.bias, stride=1, padding=0)
        return out

It is worth noting that the data type of weights to be trained for each layer in the PyTorch is set to nn.parameter rather than Tensor or Variable. Parameter’s require_grad defaults to true, and Varaible defaults to False.

Solution of visdom startup failure in Windows 10

Task description
Recently collected a batch of data, want to call Cyclegan to complete the domain migration to see the effect. So I found the open source Cyclegan code on the Internet, the code can run normally, but the call to Visidom will always show an Error: HTTP Error. So record the process of my solution
 
Start the visdom

python -m visdom.server

Calling CMD to start visdom.server but the code will get stuck, stuck in downloading the script
 
To solve the caton
The reason is that the file is difficult to download. Here’s how to solve it
Find the location of the Visidom package in the current environment, roughly: ~\Lib\site-packages\visdom Open server.py and look for download_scripts and comment this line so that download_scripts() is not executed
After this operation, and then start Visidom, the model will run smoothly, and no exception thrown. But there is a problem, open the page blue screen.
 
To solve the blue screen
The reason for the blue screen is that it does not download properly. The solution here refers to two articles, both of which are cited in the following references
Into local visdom in static files, there is a index. The HTML files, the backup download reference (2) in the index. The HTML files, to replace the current folder has the backup index. The restart visdom HTML files, open the page, the question remains, to be the next step will be the backup of the original index. The HTML to replace the current index. The HTML restart visdom, problem solving
 
reference
https://blog.csdn.net/AnthongDai/article/details/79117472https://github.com/chenyuntc/pytorch-book/blob/2c8366137b691aaa8fbeeea478cc1611c09e15f5/README.md#visdom%E6%89%93%E4%B8%8D%E5%BC%80%E5%8F%8A%E5 %85%B6%E8%A7%A3%E5%86%B3%E6%96%B9%E6%A1%88
 
This article is the author’s original, reproduced need to indicate the source!

The pychar / pytorch page file is too small to complete the operation

Possible reasons for
If baidu, most can tell you is because virtual memory is insufficient, let you increase virtual memory. On Windows, PyTorch may have a problem with its num_workers, which needs to be set to 0.
The solution
The lack of virtual memory may not be a setup problem, but it may simply be a lack of disk space. I cleaned up the disk and the problem was solved successfully.

CONDA install torch error

If you can’t find the corresponding version, you need to add another PyTorch source

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/

Conda toggle Tsinghua source complete command:

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge 
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/
conda config --set show_channel_urls yes

Installation of Python on MAC

First, install Anaconda
Installation environment:

Although I’m on a MAC and have Python shipped, I have Anaconda installed first. Because it integrates many third-party Python libraries, and it is easy to manage different versions of Python, switching between different versions of Python. Anaconda is also a scientific computing environment. After Anaconda is installed on your computer, you will have some common libraries installed as well as Python installed.

The author installed Python version 2.7 Anaconda, and after installing Anaconda, Python and some common libraries are already installed. In addition, the Spyder was installed automatically.
2. Establish, activate and install Pytorch
Open the terminal and type:

conda create -n [name] python=3.5

[

n

a

m

e

]

[name]

Replace [name] with the name of the environment you want, without typing []. Depending on your needs, you can choose between different versions of Python. Just change 3.5 to 3.6 or 2.7
Then, after the completion of the execution, the execution:

source activate [name]

At this point, the runtime environment is activated.
Then execute PIP install torch torchvision to perform Pytorch installation.
When finished, the installation is complete
If you need to use the GPU version, install it using the source code. Download or visit the page making, others have been compiled Pytorch GPU version of https://github.com/TomHeaven/pytorch-osx-build

CUDA error:out of memory

Today, when I was running the program, I kept reporting this error, saying that I was out of CUDA memory. After a long time of debugging, it turned out to be
 
At first I suspected that the graphics card on the server was being used, but when I got to mvidia-SMi I found that none of the three Gpus were used. That question is obviously impossible. So why is that?
 
Others say the TensorFlow and Pytorch versions conflict. ?????I didn’t get TensorFlow
 
The last reference the post: http://www.cnblogs.com/jisongxie/p/10276742.html
 

Yes, Like the blogger, I’m also using a No. 0 GPU, so I don’t know why my Pytorch process works. I can only see a no. 2 GPU physically, I don’t have a no. 3 GPU. So something went wrong?
 
So I changed the code so that PyTorch could see all the Gpus on the server:

OS. Environ [‘ CUDA_VISIBLE_DEVICES] = ‘0’
 
Then on the physics of no. 0 GPU happily run up ~~~