Python code training neural network: “Imerror: DLL load failed: page file is too small to complete operation.”
This problem can arise in two ways.
(1) not only in running a project to another project a python program is running, turn off it.
② The Windows operating system does not support Python’s multi-process operation. The place where the neural network uses multiple processes is on the data set load, so set the num_workers parameter in the DataLoader to 0.

    train_loader = torch.utils.data.DataLoader(
        num_workers=0,  # SET 0

tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.

After learning Chapter 5 of deep learning with Python, deeply learn the thermodynamic diagram for computer vision
5.4.3 visualization class activation
when running the code in tensorflow 2.0 environment

grads = K.gradients(african_elephant_output, last_conv_layer.output)[0]

replace with

grads = tf.keras.backend.gradients(african_elephant_output, last_conv_layer.output)[0]

The following errors still occur

tf.gradients is not supported when eager execution is enabled. Use tf.GradientTape instead.


with tf.GradientTape() as gtape:
    grads = gtape.gradient(african_elephant_output, last_conv_layer.output)

Full code reference

reference resources:


RuntimeError: CUDA out of memory occurs using the PyTorch training model
Training: Due to the limited GPU video memory resources, the batchsize of training input should not be too large, which will lead to Out of Memory errors.
Solution: Reduce the batchSize to even 1
Use with torch.no_grad():fore testing the code

ImporError: DLL Load Failed: The page file is too small to complete operation.

Cause analysis,

2> Other programs are running, solution: wait for the other programs to finish running or close the other programs. Turn off all useless programs on your computer. Also, Python.ext should not be used by two programs at the same time. For example, if you are using PyDev + Anaconda, turn one off. *

ECCV 2020 panoramic segmentation papers (2 papers)

The official series of Computer Vision Daily organized the large-scale inventory work of ECCV 2020
See above for details:
2020 target detection ECCV paper large inventory (49 papers) ECCV 2020 semantic segmentation large inventory (article 37) [ECCV paper 2020 instance segmentation paper inventory (12 paper) (https://blog.csdn.net/amusi1994/article/details/108999316)
This paper mainly includes: panoramic segmentation and other directions. Two papers have been sorted out, and the PDF of all papers have been packaged. Baidu cloud resources are as follows:

Link: https://pan.baidu.com/s/12WBsFFJKelcS7Fvrqiv3HQ
extraction code: t7nr

The article directories
Preface Panoramic Segmentation Paper Download PDF

Panoramic segmentation
Joint Semantic Instance Segmentation on Graphs with the Semantic Mutex Watershed

Author units: Heidelberg university paper: https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/5393_ECCV_2020_paper.php code: no Chinese reading: no
Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Author unit: Johns Hopkins university, Google paper: https://www.ecva.net/papers/eccv_2020/papers_ECCV/html/1564_ECCV_2020_paper.php code: https://github.com/csrhddlam/axial-deeplab in Chinese reading: no
Paper PDF Download
The PDF of the above 14 papers has all been packaged, Baidu Cloud link:

Link: https://pan.baidu.com/s/12WBsFFJKelcS7Fvrqiv3HQ
extraction code: t7nr

Solve the problem of red wavy line in the module imported from Pycharm
A red wavy line appears in the module imported by myself in Pycharm, as shown in the figure below. However, it can operate normally. The main problem is the file directory, and the module simply imported by import cannot find the path.

if you don’t feel comfortable with the red wavy line, you can also choose to solve this problem. The next two steps will be completed.
step 1:
enter Settings, go to the Python Console under the Console, check the option “Add source roots to PYTHONPAT”, and then click OK
. Step 2:
right click on the Directory and select Mark Directory as in the popup menu bar, then continue to select Sources Root, and you will immediately see the red wavy line in the code has been automatically removed.

Solution to unbalanced load of multiple cards (GPU’s 0 card is too high) in Python model training (simple and effective)

this paper mainly solves the problem that zero card of pytorch GPU occupies more video memory than other CARDS during model training. As shown in the figure below: the native GPU card is TITAN RTX, video memory is 24220M, batch_size = 9, and three CARDS are used. The 0th card video memory occupies 24207M. At this time, it just starts to run, and only a small amount of data is transferred to the video card. If the data is in multiple points, the video memory of the 0 card must burst. The reason why 0 card has higher video memory: During the back propagation of the network, the calculated gradient of loss is calculated on 0 card by default. So will be more than other graphics card some video memory, how much more specific, mainly to see the structure of the network.

as a result, in order to prevent training was interrupted due to out of memory. The foolhardy option is to set batch_size to 6, or 2 pieces of data per card.
batch_size = 6, the other the same, as shown in the figure below

have found the problem?Video memory USES only 1,2 CARDS and less than 16 gigabytes of memory. The batch_size is sacrificed because the 0 card might exceed a little bit of video memory.
so there’s no more elegant way?The answer is yes. That is borrowed from the transformer – xl BalancedDataParallel used in the class. The code is as follows (source) :

import torch
from torch.nn.parallel.data_parallel import DataParallel
from torch.nn.parallel.parallel_apply import parallel_apply
from torch.nn.parallel._functions import Scatter

def scatter(inputs, target_gpus, chunk_sizes, dim=0):
    Slices tensors into approximately equal chunks and
    distributes them across given GPUs. Duplicates
    references to objects that are not tensors.

    def scatter_map(obj):
        if isinstance(obj, torch.Tensor):
                return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
            except Exception:
                print('obj', obj.size())
                print('dim', dim)
                print('chunk_sizes', chunk_sizes)
        if isinstance(obj, tuple) and len(obj) > 0:
            return list(zip(*map(scatter_map, obj)))
        if isinstance(obj, list) and len(obj) > 0:
            return list(map(list, zip(*map(scatter_map, obj))))
        if isinstance(obj, dict) and len(obj) > 0:
            return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
        return [obj for targets in target_gpus]

    # After scatter_map is called, a scatter_map cell will exist. This cell
    # has a reference to the actual function scatter_map, which has references
    # to a closure that has a reference to the scatter_map cell (because the
    # fn is recursive). To avoid this reference cycle, we set the function to
    # None, clearing the cell
        return scatter_map(inputs)
        scatter_map = None

def scatter_kwargs(inputs, kwargs, target_gpus, chunk_sizes, dim=0):
    """Scatter with support for kwargs dictionary"""
    inputs = scatter(inputs, target_gpus, chunk_sizes, dim) if inputs else []
    kwargs = scatter(kwargs, target_gpus, chunk_sizes, dim) if kwargs else []
    if len(inputs) < len(kwargs):
        inputs.extend([() for _ in range(len(kwargs) - len(inputs))])
    elif len(kwargs) < len(inputs):
        kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))])
    inputs = tuple(inputs)
    kwargs = tuple(kwargs)
    return inputs, kwargs

class BalancedDataParallel(DataParallel):

    def __init__(self, gpu0_bsz, *args, **kwargs):
        self.gpu0_bsz = gpu0_bsz
        super().__init__(*args, **kwargs)

    def forward(self, *inputs, **kwargs):
        if not self.device_ids:
            return self.module(*inputs, **kwargs)
        if self.gpu0_bsz == 0:
            device_ids = self.device_ids[1:]
            device_ids = self.device_ids
        inputs, kwargs = self.scatter(inputs, kwargs, device_ids)
        if len(self.device_ids) == 1:
            return self.module(*inputs[0], **kwargs[0])
        replicas = self.replicate(self.module, self.device_ids)
        if self.gpu0_bsz == 0:
            replicas = replicas[1:]
        outputs = self.parallel_apply(replicas, device_ids, inputs, kwargs)
        return self.gather(outputs, self.output_device)

    def parallel_apply(self, replicas, device_ids, inputs, kwargs):
        return parallel_apply(replicas, inputs, kwargs, device_ids)

    def scatter(self, inputs, kwargs, device_ids):
        bsz = inputs[0].size(self.dim)
        num_dev = len(self.device_ids)
        gpu0_bsz = self.gpu0_bsz
        bsz_unit = (bsz - gpu0_bsz) // (num_dev - 1)
        if gpu0_bsz < bsz_unit:
            chunk_sizes = [gpu0_bsz] + [bsz_unit] * (num_dev - 1)
            delta = bsz - sum(chunk_sizes)
            for i in range(delta):
                chunk_sizes[i + 1] += 1
            if gpu0_bsz == 0:
                chunk_sizes = chunk_sizes[1:]
            return super().scatter(inputs, kwargs, device_ids)
        return scatter_kwargs(inputs, kwargs, device_ids, chunk_sizes, dim=self.dim)

you can see, in the code BalancedDataParallel inherited the torch. The nn. DataParallel, through the custom after 0, the size of the card batch_size gpu0_bsz, namely 0 card a bit less data. Balance the memory usage of 0 CARDS with other CARDS. The invocation code is as follows:

import BalancedDataParallel

 if n_gpu > 1:
    model = BalancedDataParallel(gpu0_bsz=2, model, dim=0).to(device)
    # model = torch.nn.DataParallel(model)

gpu0_bsz: 0 card batch_size of GPU;
model: model;
dim: batch dimension

as a result, we might as well just batch_size set to 8, namely gpu0_bsz = 2 try, the results are as follows:

the batch_size from 6 to 8 of success, because 0 put a batch less, therefore, will be smaller than the other CARDS. But sacrificing the video memory of one card to the video memory of others, eventually increasing the batch_size, is still available. The advantages of this method are even more obvious when the number of CARDS is large.