Tag Archives: Deep learning

RuntimeError: CUDA error: device-side assert triggered

reason

The reason for the error is that when calculating the loss function in pytorch, the tag is (batch, height, width). If the category is 10, the value should be 0 ~ 9, that is:
0 & lt= value<= C-1, where C is the number of channels or categories

terms of settlement

My category is 10, and the value is 1 ~ 10, so you only need to subtract 1, as shown below.

c_loss = nn.CrossEntropyLoss()
labels_v = labels_v-1 
loss0 = c_loss(d0, labels_v.long())

Summary

This is mainly because the tag data of your training data may exceed the number of tags set in the configuration file. Or the number of tags in the validation set exceeds the number of tags in the training set.

[Solved] D2lzh_Pytorch Import error: importerror: DLL load failed while importing

Import d2lzh_Pytorch reports an error, importerror: DLL load failed while importing_ Torchtext: the specified program cannot be found.!! OMG

Guide Package

import torch
import torch.nn as nn
import numpy as np
import pandas as pd
import sys
sys.path.append("..") 
import d2lzh_pytorch as d2l

The error is as follows:

ImportError: DLL load failed while importing _torchtext: The specified program could not be found.

The solution is as follows:

#check torch version
import torch
print(torch.__version__) #1.7.0

#Download the torchtext that corresponds to your own torch version
pip install torchtext==0.6

Perfect solution to the problem

Pytorch directly creates a tensor on the GPU error [How to Solve]

Pytoch directly creates tensors on the GPU and reports an error: Legacy constructor expectations device type: cpubut device type: CUDA was passed

General tensor creation method:

torch.Tensor(x)

However, by default, the tensor is placed in the CPU (memory). If we want to use the GPU to train the model, we must also copy the tensor to the GPU, which will obviously save time
I’ve seen other articles before saying that tensors can be created directly on the GPU, so I’ve also made a try:

MyDevice=torch.device('cuda:0')
x = torch.LongTensor(x, device=MyDevice)

An error is reported when running the program:

legacy constructor expects device type: cpu but device type: cuda was passed

According to the error report analysis, the problem is that the device parameter cannot be passed ‘CUDA’?After checking, I found that the official answer given by pytorch is that tensor class is a subclass of tensor and cannot pass parameters to its device. There is no problem using the tensor class to build
⭐ Therefore, the code is changed as follows:

MyDevice = torch.device('cuda:0')
x = torch.tensor(x, device=MyDevice)
x = x.long()

Now, there will be no more errors.

[ONNXRuntimeError] : 10 : INVALID_Graph loading model error

Project scenario:

The python model is converted to onnx model and can be exported successfully, but the following errors occur when loading the model using onnxruntime

InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_ GRAPH : Load model from T.onnx failed:This is an invalid model. Type Error: Type ‘tensor(bool)’ of input parameter (8) of operator (ScatterND) in node (ScatterND_ 15) is invalid.

Problem Description:

import torch
import torch.nn as nn
import onnxruntime
from torch.onnx import export

class Preprocess(nn.Module):
    def __init__(self):
        super().__init__()
        self.max = 1000
        self.min = -44

    def forward(self, inputs):
        inputs[inputs>self.max] = self.max
        inputs[inputs<self.min] = self.min
        return inputs
        
x = torch.randint(-1024,3071,(1,1,28,28))
model = Preprocess()
model.eval()

export(
    model,
    x,
    "test.onnx",
    input_names=["input"],
    output_names=["output"],
    opset_version=11,
)

session = onnxruntime.InferenceSession("test.onnx")

Cause analysis:

The same problem can be found in GitHub of pytorch #34054

Solution:

The specific operations are as follows: Mr. Cheng mask , and then use torch.masked_ Fill() operation. Instead of using the index to directly assign the entered tensor

class MaskHigh(nn.Module):
    def __init__(self, val):
        super().__init__()
        self.val = val

    def forward(self, inputs):
        x = inputs.clone()
        mask = x > self.val
        output = torch.masked_fill(inputs, mask, self.val)
        return output


class MaskLow(nn.Module):
    def __init__(self, val):
        super().__init__()
        self.val = val

    def forward(self, inputs):
        x = inputs.clone()
        mask = x < self.val
        output = torch.masked_fill(inputs, mask, self.val)
        return output


class Clip(nn.Module):
    def __init__(self):
        super().__init__()
        self.high = MaskHigh(1300)
        self.low = MaskLow(-44)

    def forward(self, inputs):
        output = self.high(inputs)
        output = self.low(output)
        return output

Netron can be used to visualize the calculation diagram generated by the front and rear methods

Index assignment

[Solved] YOLOv4 Error: Layer before convolutional layer must output image.: No error

Recently, when learning yolo4 and running your own data set with yolo4, I found that the
training set layer before revolutionary layer must output image.: no error.

1. Solution

Check the customized cfg file. The size of the input image is set as follows

if both height and width are set to a multiple of 32, this problem will not occur. I set it here as 416, 416

2. Follow up questions

Pay attention to setting size to the size of the picture in your dataset, otherwise, you may not be able to open the picture. The error is as follows

Can’t load image xxxxxxxxxxxxxxxxxx

[Solved] Pytorch Error: AttributeError: ‘Tensor‘ object has no attribute ‘backword‘

Pytorch error attribute error: ‘tensor’ object has no attribute ‘backword’

According to the error description, there is no backword attribute.

error code

loss.backword() # Reverse Propagation

correct

loss.backward() # Reverse Propagation

I mistyped a a letter. There is no error prompt on the Jupiter notebook editor, which is difficult to find 😂

nvidia-settings: ERROR: nvidia-settings could not find the registry key file

Problem process

Download the official driver version of the adapter PC NVIDIA. After installation, NVIDIA SMI confirms that it is normal, and NVIDIA settings reports an error

Error message

ERROR: nvidia-settings could not find the registry key file or the X server is
       not accessible. This file should have been installed along with this
       driver at /usr/share/nvidia/nvidia-application-profiles-key-documentation. The
       application profiles will continue to work, but values cannot be
       prepopulated or validated, and will not be listed in the help text.
       Please see the README for possible values and descriptions.

terms of settlement

After the official NVIDIA driver is installed, there are files of the corresponding version in the path of/usr/share/NVIDIA. The version I installed is nvidia-linux-x86_ 64-460.80.run, perform the following operations to solve the problem.

sudo cp nvidia-application-profiles-460.80-key-documentation nvidia-application-profiles-key-documentation

RuntimeError: cudnn RNN backward can only be called in training mode

1 ‘. Train()’ is not used. Some operations are prohibited when ‘. Eval()’

2. Add one line of the following code to the error reported by using LSTM

torch.backends.cudnn.enabled = False

RuntimeError: nvrtc: error: failed to open libnvrtc-builtins.so.11.1.

RuntimeError: nvrtc: error: failed to open libnvrtc-builtins.so.11.1.
  Make sure that libnvrtc-builtins.so.11.1 is installed correctly.

It’s because the python version is wrong. I can’t use Python 3.9 and python 3.6. I can use Python 3.7.

Maskrcnn-benchmark Error: KeyError “Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS“

When trying to extract visual features using VQA maskrcnn benchmark: files · master · vedanuj Goswami/VQA maskrcnn benchmark · gitlab,

After compiling maskrcnn benchmark according to the instructions of install, run

python script/extract_features.py ...

An error occurred:

KeyError "Non-existent config key: MODEL.BACKBONE.OUT_CHANNELS"

The problem is: instead of compiling maskrcnn benchmark, you can compile setup.py under VQA maskrcnn benchmark

PS: the author has made corresponding fine adjustments to the network structure and code. The structure in the original maskrcnn library does not correspond to config

M1 MacBook cannot use pip to install pandas [How to Solve]

Use pip install pandas==1.1.5 report an error:

WARNING: Discarding https://files.pythonhosted.org/packages/fb/e4/828bb9c2474ff6016e5ce96a78220d485436d5468c23068f4f6c2eb9cff8/pandas-1.1.5.tar.gz#sha256=f10fc41ee3c75a474d3bdf68d396f10782d013d7f67db99c0efbfd0acb99701b (from https://pypi.org/simple/pandas/) (requires-python:>=3.6.1). Command errored out with exit status 1: /Users/jeremy/miniforge3/envs/py38/bin/python3.8 /private/var/folders/sv/j650zk9s7wjf8yw7k6m2052h0000gn/T/pip-standalone-pip-ihhdno55/__env_pip__.zip/pip install –ignore-installed –no-user –prefix /private/var/folders/sv/j650zk9s7wjf8yw7k6m2052h0000gn/T/pip-build-env-20_ww9oi/overlay –no-warn-script-location –no-binary :none: –only-binary :none: -i https://pypi.org/simple — setuptools wheel ‘Cython>=0.29.21,<3’ ‘numpy==1.15.4; python_version=='”‘”‘3.6′”‘”‘ and platform_system!='”‘”‘AIX'”‘”” ‘numpy==1.15.4; python_version=='”‘”‘3.7′”‘”‘ and platform_system!='”‘”‘AIX'”‘”” ‘numpy==1.17.3; python_version=='”‘”‘3.8′”‘”‘ and platform_system!='”‘”‘AIX'”‘”” ‘numpy==1.16.0; python_version=='”‘”‘3.6′”‘”‘ and platform_system=='”‘”‘AIX'”‘”” ‘numpy==1.16.0; python_version=='”‘”‘3.7′”‘”‘ and platform_system=='”‘”‘AIX'”‘”” ‘numpy==1.17.3; python_version=='”‘”‘3.8′”‘”‘ and platform_system=='”‘”‘AIX'”‘”” ‘numpy; python_version>='”‘”‘3.9′”‘”” Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement pandas==1.1.5. (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0, 0.19.1, 0.19.2, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0, 0.21.1, 0.22.0, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0, 0.24.1, 0.24.2, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5, 1.2.0, 1.2.1, 1.2.2, 1.2.3, 1.2.4, 1.2.5, 1.3.0, 1.3.1, 1.3.2)
ERROR: No matching distribution found for pandas==1.1.5

Solution:
conda install pandas==1.1.5

RuntimeError:An attempt has been made to start a new process before the……

Key errors are as follows:

RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

All error prompts are as follows:

// 
Traceback (most recent call last):
Traceback (most recent call last):
  File "main.py", line 19, in <module>
  File "<string>", line 1, in <module>
    t.train()
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\trainer.py", line 45, in train
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\spawn.py", line 105, in spawn_main
    for batch, (lr, hr, _, idx_scale) in enumerate(self.loader_train):
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\dataloader.py", line 144, in __iter__
    exitcode = _main(fd)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\spawn.py", line 225, in prepare
    return _MSDataLoaderIter(self)
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\dataloader.py", line 117, in __init__
    _fixup_main_from_path(data['init_main_from_path'])
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    w.start()
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\process.py", line 105, in start
    run_name="__mp_main__")
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\runpy.py", line 263, in run_path
    self._popen = self._Popen(self)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\context.py", line 223, in _Popen
    pkg_name=pkg_name, script_name=fname)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\runpy.py", line 85, in _run_code
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\context.py", line 322, in _Popen
    exec(code, run_globals)
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\main.py", line 19, in <module>
    t.train()
    return Popen(process_obj)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\trainer.py", line 45, in train
    for batch, (lr, hr, _, idx_scale) in enumerate(self.loader_train):
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\dataloader.py", line 144, in __iter__
    reduction.dump(process_obj, to_child)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\reduction.py", line 60, in dump
    return _MSDataLoaderIter(self)
  File "c:\Paper Code\RCAN-master-Real\RCAN_TrainCode\code\dataloader.py", line 117, in __init__
    w.start()
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\process.py", line 105, in start
    ForkingPickler(file, protocol).dump(obj)
    self._popen = self._Popen(self)
BrokenPipeError: [Errno 32] Broken pipe
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "C:\Anaconda3\envs\pytorch0.4.0\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError:
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Original code:


    torch.manual_seed(args.seed)
    checkpoint = utility.checkpoint(args)

    if checkpoint.ok:
        loader = data.Data(args)
        model = model.Model(args, checkpoint)
        loss = loss.Loss(args, checkpoint) if not args.test_only else None
        t = Trainer(args, loader, model, loss, checkpoint)
        while not t.terminate():
            t.train()
            t.test()

        checkpoint.done()

After modification:

if __name__ == '__main__':
    torch.manual_seed(args.seed)
    checkpoint = utility.checkpoint(args)

    if checkpoint.ok:
        loader = data.Data(args)
        model = model.Model(args, checkpoint)
        loss = loss.Loss(args, checkpoint) if not args.test_only else None
        t = Trainer(args, loader, model, loss, checkpoint)
        while not t.terminate():
            t.train()
            t.test()

        checkpoint.done()

Run~~

Study notes.

ProgrammerAH

Programmer Guide, Tips and Tutorial