Tag Archives: Deep learning

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

Preface solution

preface

Today, we use Yolo V5.6 training model and modify the batch size to 32. The following error occurred:

Starting training for 100 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
  0%|                                                                                                                                                                         | 0/483 [00:04<?, ?it/s]
Traceback (most recent call last):
  File "train.py", line 620, in <module>
    main(opt)
  File "train.py", line 517, in main
    train(opt.hyp, opt, device, callbacks)
  File "train.py", line 315, in train
    pred = model(imgs)  # forward
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\liufq\yolov5-6.0\models\yolo.py", line 126, in forward
    return self._forward_once(x, profile, visualize)  # single-scale inference, train
  File "D:\liufq\yolov5-6.0\models\yolo.py", line 149, in _forward_once
    x = m(x)  # run
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\liufq\yolov5-6.0\models\common.py", line 137, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\liufq\yolov5-6.0\models\common.py", line 45, in forward
    return self.act(self.bn(self.conv(x)))
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\conv.py", line 443, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "E:\Anaconda3\envs\yolov550\lib\site-packages\torch\nn\modules\conv.py", line 440, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Unable to find a valid cuDNN algorithm to run convolution

terms of settlement

Modify the size of batchsize and make it smaller.

[MMCV]RuntimeError: CUDA error: no kernel image is available for execution on the device

There are two reasons for this problem
first, the GPU computing power and the python version do not match
Second, the server uses a combination of graphics cards with different computing power
on the first point, pytorch no longer supports graphics cards with computing power less than 3.7 after 1.3.0. You can reinstall the lower version of pytorch. The corresponding version can be found in the following link:
torch, torchvision historical version download
common graphics card computing power is as follows

GPU	Compute Capability
NVIDIA TITAN RTX	7.5
Geforce RTX 2080 Ti	7.5
Geforce RTX 2080	7.5
Geforce RTX 2070	7.5
Geforce RTX 2060	7.5
NVIDIA TITAN V	7.0
NVIDIA TITAN Xp	6.1
NVIDIA TITAN X	6.1
GeForce GTX 1080 Ti	6.1
GeForce GTX 1080	6.1
GeForce GTX 1070	6.1
GeForce GTX 1060	6.1
GeForce GTX 1050	6.1
GeForce GTX TITAN X	5.2
GeForce GTX TITAN Z	3.5
GeForce GTX TITAN Black	3.5
GeForce GTX TITAN	3.5
GeForce GTX 980 Ti	5.2
GeForce GTX 980	5.2
GeForce GTX 970	5.2
GeForce GTX 960	5.2
GeForce GTX 950	5.2
GeForce GTX 780 Ti	3.5
GeForce GTX 780	3.5
GeForce GTX 770	3.0
GeForce GTX 760	3.0
GeForce GTX 750 Ti	5.0
GeForce GTX 750	5.0
GeForce GTX 690	3.0
GeForce GTX 680	3.0
GeForce GTX 670	3.0
GeForce GTX 660 Ti	3.0
GeForce GTX 660	3.0
GeForce GTX 650 Ti BOOST	3.0
GeForce GTX 650 Ti	3.0
GeForce GTX 650	3.0
GeForce GTX 560 Ti	2.1
GeForce GTX 550 Ti	2.1
GeForce GTX 460	2.1
GeForce GTS 450	2.1
GeForce GTS 450*	2.1
GeForce GTX 590	2.0
GeForce GTX 580	2.0
GeForce GTX 570	2.0
GeForce GTX 480	2.0
GeForce GTX 470	2.0
GeForce GTX 465	2.0
GeForce GT 740	3.0
GeForce GT 730	3.5
GeForce GT 730 DDR3,128bit	2.1
GeForce GT 720	3.5
GeForce GT 705*	3.5
GeForce GT 640 (GDDR5)	3.5
GeForce GT 640 (GDDR3)	2.1
GeForce GT 630	2.1
GeForce GT 620	2.1
GeForce GT 610	2.1
GeForce GT 520	2.1
GeForce GT 440	2.1
GeForce GT 440*	2.1
GeForce GT 430	2.1
GeForce GT 430*	2.1
GPU	Compute Capability
Tesla K80	3.7
Tesla K40	3.5
Tesla K20	3.5
Tesla C2075	2.0
Tesla C2050/C2070	2.0

On the second point, if you make an error in the mmcv framework, recompile mmcv according to the computing power of your graphics card. Take two graphics cards with computing power of 6.1 and 7.5 as examples to compile. The commands are as follows:

TORCH_CUDA_ARCH_LIST="6.1;7.5"   pip install mmcv-full == {mmcv_version} -f   	https://download.openmmlab.com/mmcv/dist/{cuda version}/{pytorch version}/index.html

Among them, CUDA version and pytorch version are replaced by your version, such as cud101, torch 1.7.0
for specific corresponding information, please refer to GitHub of mmcv

RuntimeError: Found dtype Double but expected Float”

RuntimeError: Found dtype Double but expected Float”

I made a mistake in finding the loss function,

resolvent:

target.float()

a=np.array([[1,2],[3,4]])
b=np.array([[2,3],[4,4]])

loss_fn = torch.nn.MSELoss(reduce=True, size_average=True)

input = torch.autograd.Variable(torch.from_numpy(a))
target = torch.autograd.Variable(torch.from_numpy(b))

loss = loss_fn(input.float(), target.float())

print(loss)

cv2.error: OpenCV(4.5.1) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-buil

catalogue

The following operations are carried out under the windows environment.  

one   Change single slash to double slash

2. The modified path is in English

3. Delete CV2. Imshow ()


When I use CV2 to save pictures to a new price asking folder, I report an error CV2. Error: opencv (4.5.1) C: \ users \ appveyor \ appdata \ local \ temp \ 1 \ PIP req build

The solution is as follows:

The following operations are carried out under the windows environment.  

one   Change single slash to double slash

Some people on the Internet say that the path reading error is caused by the number in the path. Correction method: change the single slash in the path into double slash.

For example:

cv2.imwrite(save_dir + '\\' + img_name, img)

However, the same error is reported later, so read the following comments for 2 operations:  

2. The modified path is in English

I reported a similar error here because Chinese appears in the quoted picture path. If you change the picture path to an all English path, there will be no problem.   

for i in ori_imgs_single:
    img_name = i.split('\\')[-1]
    img = cv2.imread(i)
    cv2.imwrite(save_dir + '\\' + img_name, img)
print('save OK!')

OK:

3. Delete CV2. Imshow ()

Maybe there is something wrong with my environment. If you still report an error, try deleting CV2. Imshow (). If I delete it, it will be OK:  

 

  OK:

 

[Solved] Grid Search Error (GridSearchCV): ‘ascii‘ codec can‘t encode characters in position 18-20: ordinal not in r

Grid Search Error: UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 18-20: ordinal not in range(128)

E:\DLstudy\Scripts\python.exe E:/PycharmProjects/DLstudy/run/train_model.py
[INFO] tuning hyperparameters...
Traceback (most recent call last):
  File "E:\PycharmProjects\DLstudy\run\train_model.py", line 22, in <module>
    model.fit(trainX, trainY)
  File "E:\DLstudy\lib\site-packages\sklearn\model_selection\_search.py", line 820, in fit
    with parallel:
  File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 725, in __enter__
    self._initialize_backend()
  File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 735, in _initialize_backend
    n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
  File "E:\DLstudy\lib\site-packages\joblib\_parallel_backends.py", line 494, in configure
    self._workers = get_memmapping_executor(
  File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 20, in get_memmapping_executor
    return MemmappingExecutor.get_memmapping_executor(n_jobs, **kwargs)
  File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 42, in get_memmapping_executor
    manager = TemporaryResourcesManager(temp_folder)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 531, in __init__
    self.set_current_context(context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 535, in set_current_context
    self.register_new_context(context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 560, in register_new_context
    self.register_folder_finalizer(new_folder_path, context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 590, in register_folder_finalizer
    resource_tracker.register(pool_subfolder, "folder")
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 191, in register
    self._send('REGISTER', name, rtype)
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

Process finished with exit code 1

Solutioin:

Original error code:

model = GridSearchCV(LogisticRegression(), params, cv=3, n_jobs=-1)

Set parameter n_jobs = - 1 parameter can be deleted and changed to:

model = GridSearchCV(LogisticRegression(), params, cv=3)

After a look, this parameter indicates how many processors we need to work

 

n_jobs : int, default=None
        Number of jobs to run in parallel.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
        for more details.

If n_jobs = – 1 is specified, there is a step at the bottom to use ASCII for coding, but the coding fails every time
therefore, if we do not specify this parameter, one processor will be used by default.

If you really want to specify multiple processors

Then we need to modify the code of the path with the problem in our error message
for example, in our error messages:

  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

Note the location of my error is e:\dlstudy\lib\site packages\joblib\externals\rocky\backend\resource_Line 204 of tracker.py In the _send method, click
Source code of _send function:

  def _send(self, cmd, name, rtype):
        msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
        if len(name) > 512:
            # posix guarantees that writes to a pipe of less than PIPE_BUF
            # bytes are atomic, and that PIPE_BUF >= 512
            raise ValueError('name too long')
        nbytes = os.write(self._fd, msg)
        assert nbytes == len(msg)

Change
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
to
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
That is, the encoding is changed to utf-8, and the changed code is as follows.

```python
  def _send(self, cmd, name, rtype):
        msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
        if len(name) > 512:
            # posix guarantees that writes to a pipe of less than PIPE_BUF
            # bytes are atomic, and that PIPE_BUF >= 512
            raise ValueError('name too long')
        nbytes = os.write(self._fd, msg)
        assert nbytes == len(msg)

Then run the code again
don’t worry, you will still report errors. Because we only modified the encoding method, but not the decoding method
the error information is as follows:

     .............(...)
    splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)
Traceback (most recent call last):
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 253, in main
    splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

Similarly, find the error path in the error message: e:\dlstudy\lib\sitepackages\joblib\externals\loky\backend\resource

There is an error in line 253 of the tracker.py file. We found the corresponding location:

......(...)
       with open(fd, 'rb') as f:
            while True:
                line = f.readline()
                if line == b'':  # EOF
                    break
                try:
                    splitted = line.strip().decode('ascii').split(':')
                    # name can potentially contain separator symbols (for
                    # instance folders on Windows)
                    cmd, name, rtype = (
                        splitted[0], ':'.join(splitted[1:-1]), splitted[-1])
......(...)

We just need to replace
line. Strip(). Decode ('ascii '). Split (': ')
with
line. Strip(). Decode (' 'utf8). Split (': ') ,
Run the file again to succeed.

[Solved] CONDA ENV create and run Error: F environment.yml under win10

The error report description corresponds to the solution according to the serial number. Since everyone has different luck and problems when installing the software, read it as needed. (of course, I stumbled all the way, so friends who installed for the first time still suggest reading the error report description first, and then read it as needed)

Error reporting description

    1. the CMD console constantly displays the following warning information

    1. can’t find the version of Matplotlib = = 2.2.2 (if the corresponding version number of other packages can’t be found, it can also be handled as this) failed to build panda numpy

Solution

      1. the PIP command may be omitted in the environment.yml file. Just add the PIP command in the corresponding position of the file (the content indicated by the red arrow in the figure below)

      1. delete the following version number (as shown by the red arrow)

      1. failed to build panda numpy

 

      1. to be honest, in this environment, I ignored the error report, I found that it doesn’t seem to affect my subsequent operation. For example, I can open the gluon environment and open jupyter notepad in the gluon environment. (if there is any subsequent impact, I will continue to solve it and update the content)


Reference link:

        1. after installing miniconda3, run CONDA env create – F environment.yml and report an error miniconda installs numpy but Python can’t import it

Add a little knowledge. The command to delete the gluon environment is as follows:

conda remove -n gluon --all  

Resolve the error raise importerror, str (MSG) + ‘, please install the python TK package’ (valid for personal testing)

Solve the following similar error reports:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/lib-tk/Tkinter.py", line 42, in <module>
    raise ImportError, str(msg) + ', please install the python-tk package'
ImportError: /usr/lib/libtk8.5.so.0: invalid ELF header, please install the python-tk package

terms of settlement:

It appears that the library is corrupted
try sudo apt get remove Python TK ,
then sudo apt get clean download the package again,
sudo apt get install Python TK and then try importing again
this problem is resolved.

Another possibility is that you somehow messed up your apt/sources. List, and you installed a library for the wrong platform.

Reference link:
https://stackoverflow.com/questions/11043844/python-tk-package-not-recognized-in-python-2-7-3

Using next (ITER (data. Dataloader()) to report an error stopiteration

The error stopiteration is reported when using next (ITER (data. Dataloader()). This is because when using next() When accessing an iterator that has been iterated, an error will be triggered: stopiteration, that is, after a round of iteration after the dataloader imports the data, it is found that there is no data when importing again, that is, after the Iterable is completed, stopiteration is triggered, and then the loop jumps out

resolvent:

Since there is no data when importing again, we can use a data loader.

Put the in train.py

inps, targets = next(self.batch_iterator)

Change to:

try:
    inps, targets = next(self.batch_iterator)
except StopIteration:
    self.batch_iterator = iter(data.DataLoader(self.train_dataset, self.args.batch_size, shuffle=True, num_workers=self.args.num_workers, collate_fn=detection_collate))
    inps, targets = next(self.batch_iterator)

Problem solving.

Mmdetection reports an error when running its own coco dataset. Does not match the length of \ ` classes \ ` 80) in cocodataset

Mmdetection trains its own data set to report errors ⚠️ :

# AssertionError: The `num_ classes` (3) in Shared2FCBBoxHead of MMDataParallel does not matches the length of `CLASSES` 80) in CocoDataset

This means that the category (3) you specified does not match the category (80) of cocodataset.

You may have modified the following files, but you still report an error:

mmdetection-master\mmdet\core\evaluation\class_ names.py

 def coco_classes():
     return [
         # 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train',
         # 'truck', 'boat', 'traffic_light', 'fire_hydrant', 'stop_sign',
         # 'parking_meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep',
         # 'cow', 'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella',
         # 'handbag', 'tie', 'suitcase', 'frisbee', 'skis', 'snowboard',
         # 'sports_ball', 'kite', 'baseball_bat', 'baseball_glove', 'skateboard',
         # 'surfboard', 'tennis_racket', 'bottle', 'wine_glass', 'cup', 'fork',
         # 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange',
         # 'broccoli', 'carrot', 'hot_dog', 'pizza', 'donut', 'cake', 'chair',
         # 'couch', 'potted_plant', 'bed', 'dining_table', 'toilet', 'tv',
         # 'laptop', 'mouse', 'remote', 'keyboard', 'cell_phone', 'microwave',
         # 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase',
         # 'scissors', 'teddy_bear', 'hair_drier', 'toothbrush'
         'lm','ls'
     ]

mmdetection-master\mmdet\datasets\coco.py

 class CocoDataset(CustomDataset):
     # CLASSES = (
     #     'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
     #            'train', 'truck', 'boat', 'traffic light', 'fire hydrant',
     #            'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog',
     #            'horse', 'sheep', 'cow', 'elephant', 'bear', 'zebra', 'giraffe',
     #            'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
     #            'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
     #            'baseball glove', 'skateboard', 'surfboard', 'tennis racket',
     #            'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl',
     #            'banana', 'apple', 'sandwich', 'orange', 'broccoli', 'carrot',
     #            'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
     #            'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop',
     #            'mouse', 'remote', 'keyboard', 'cell phone', 'microwave',
     #            'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock',
     #            'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
     #     )
     CLASSES = ('lm', 'ls')

No more nonsense, go straight to the method. There are several methods:

one ️⃣ If you have two classes, you can replace the first two classes in the above two places with your classes. The method is relatively simple, but there may be hidden dangers.

two ️⃣ The second method is to modify the class_ After names.py and voc.py, the code must be recompiled (run Python setup.py install) and then trained.

I tried, but I still made the same mistake. Maybe my method is wrong.

reference resources:

New mmdetection v2.3.0 training test notes – it610.com

Mmdetectionv2. X version trains its own VOC dataset_ Peach jam Momo blog – CSDN blog

three ️⃣ The third method, which I use, is actually the same as recompilation. The reason for recompilation is that you report an error because the source file in the environment has not been modified. There are only some Python files in the mmdetection master directory. When the program is actually running, it is still the source file in the environment, because we directly modify the source file in the environment.

Suppose my CONDA environment is called CONDA_ env_ Name, so go to the following directory and modify two files respectively:

\anaconda3\envs\conda_ env_ name\lib\python3.7\site-packages\mmdet\core\evaluation\class_ names.py

\anaconda3\envs\conda_ env_ name\lib\python3.7\site-packages\mmdet\datasets\coco.py

Modify the categories in these two files in the CONDA environment.

⭐ In the end, I did my best to solve this bug and write this blog to help you avoid detours.

An error occurred when installing pytorch version 1.7 GPU

Error when installing 1.7 GPU version of pytorch: torch has an invalid wheel,. Dist info directory not found

the reason is that CUDA versions are inconsistent,

Solution 1:

Install torch for CPU version

pip install torch==1.7.0+cpu torchvision==0.8.1+cpu torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

Solution 2:

conda install pytorch==1.7.0 torchvision==0.8.0 torchaudio==0.7.0 cudatoolkit=11.0 -c pytorch

Error: SaveRasterFile failed: IDLnaMetadata Error:naGetMetadata-GetMetadataJob failed

Today, during the image test in envi, I wanted to back up (save) the image, but the following error was reported
saverasterfile failed: idlnametadata error: nagetmetadata getmetadatajob failed


the reason is that Chinese cannot appear in the storage path, otherwise an error will be reported

When exporting to raster data, envi needs to read the metadata in the input file. If there is Chinese in the path, it will be unable to read and this error will be reported. At this time, you can change the storage path
the path where I read the image happens to have Chinese files, so I want to change it to English