Tag Archives: pytorch

Pytorch — nn.Sequential () module

In short, nn.Sequential() packs a series of operations into , which could include Conv2d(), ReLU(), Maxpool2d(), etc., which could be packaged to be invoked at any point, but would be a black box, which would be invoked at forward().

extract part of the AlexNet code to understand sequential:

class AlexNet(nn.Module):
    def __init__(self, num_classes=1000, init_weights=False):
        super(AlexNet, self).__init__()
        
        self.features = nn.Sequential(
            nn.Conv2d(3, 48, kernel_size=11, stride=4, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2), 
            nn.Conv2d(48, 128, kernel_size=5, padding=2),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
            nn.Conv2d(128, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 192, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.Conv2d(192, 128, kernel_size=3, padding=1),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=3, stride=2),
        )
        ......
        
    def forward(self, x):
        x = self.features(x)
        ......
        return x

init__, self. Features = nn.Sequential(…)

in forward() just use self.features(x) to

FCOS No CUDA runtime is found, using CUDA_HOME=’/usr/local/cuda-10.0′

FCOS appears No CUDA runtime is found, using CUDA_HOME=’/usr/local/cuda-10.0′

appear below error </ li>
- reason for the error </ li>
- view version </ li>
- solution (cuda10.0 and torch did not match the 1.2.0) </ li> </ ul>
)

appears with the following error

AssertionError:
The NVIDIA driver on your system is too old (found version 10000).
Please update your GPU driver by downloading and installing a new
version from the URL: http://www.nvidia.com/Download/index.aspx
Alternatively, go to: https://pytorch.org to install
a PyTorch version that has been compiled with your version
of the CUDA driver.

error cause

cuda version does not match the torch version
. On my machine, the pytorch version is too new, while the cuda version is too old to match.
is usually the torch version that does not fit

view version

NVCC -v

# or pip3 list
PIP list

cuda10.0 with torch 1.3.1 mismatch 

solution (cuda10.0 and torch 1.2.0 match only)

uninstall the original torch1.3.1
pip3 uninstall torch
# or
PIP uninstall torch

reshipment torch1.2.0
pip3 install torch 1.2.0 torchvision 0.4.0 -f https://download.pytorch.org/whl/torch_stable.html
PIP install torch 1.2.0 torchvision 0.4.0 – f https://download.pytorch.org/whl/torch_stable.html

after checking whether torch1.2.0 has been installed successfully,
pip3 list
# or
PIP list

the final version of the cuda with torch version match go to website check can
see pytorch website https://pytorch.org/get-started/previous-versions/ to see such a

Mmdet error modulenotfounderror: no module named ‘mmcv_ ext‘

problem : when installing MMMCV, ModuleNotFoundError: No module named ‘MMCV._ext’ problem often occurs. environment : ubuntu16.04+Anaconda3+python3.7.7+cuda10.0+cuDNN7.6.4.3

solution : do not use :

when installing

pip install mmcv

use:

pip install mmcv-full

use the MMCV version that is appropriate for your environment if you have weird problems, such as

pip install mmcv-full==latest+torch1.5.0+cu101 -f https://openmmlab.oss-accelerate.aliyuncs.com/mmcv/dist/index.html

can be viewed on MMCV’s github description, as follows:

https://github.com/open-mmlab/mmcv

How to use torch.sum()

torch. Sum () sums up one dimension of the input tensor data, which are divided into two forms:

１．torch.sum(input, dtype=None)
２．torch.sum(input, list: dim, bool: keepdim=False, dtype=None) → Tensor
　
input:输入一个tensor
dim:要求和的维度，可以是一个列表
keepdim:求和之后这个dim的元素个数为１，所以要被去掉，如果要保留这个维度，则应当keepdim=True
#If keepdim is True, the output tensor is of the same size as input except in the dimension(s) dim where it is of size 1.

example:

a = torch.ones((2, 3))
print(a):
tensor([[1, 1, 1],
 		[1, 1, 1]])

a1 =  torch.sum(a)
a2 =  torch.sum(a, dim=0)
a3 =  torch.sum(a, dim=1)

print(a)
print(a1)
print(a2)

output:

tensor(6.)
tensor([2., 2., 2.])
tensor([3., 3.])

if you add keepdim=True, the dim dimension is kept from being squeezed

a1 =  torch.sum(a, dim=(0, 1), keepdim=True)
a2 =  torch.sum(a, dim=(0, ), keepdim=True)
a3 =  torch.sum(a, dim=(1, ), keepdim=True)

output:

tensor([[6.]])
tensor([[2., 2., 2.]])
tensor([[3., 3.]])

</ div>

torch.cuda.is_ Available() returns false

1, problem

after the torch gpu version is installed, torch.cuda.is_available() always returns False; But the execution of the torch. Backends. Cudnn. Enabled is TRUE.

execute nvidia-smi command without error, can display the driver information;

on the Internet, search the solution: execute the command:

sudo apt-get install nvidia-cuda-toolkit

still gives an error.

2, problem analysis

try various way, or still returns False, normal if installed correctly, return TRUE, the problem is that version of the problem, either a video card driver versions do not match, either install packages do not match.

3. Solution:

(1) method 1: update the video card driver. This method is risky and troublesome to operate, so it is not recommended.

(2) method two: find the corresponding version of cudatoolkit for installation: the specific version of each driver support, as follows:

https://docs.nvidia.com/deploy/cuda-compatibility/#binary-compatibility

installation method:

 conda install pytorch torchvision cudatoolkit=xxx(选择对应的版本) -c pytorch

Solution to unbalanced load of multiple cards (GPU’s 0 card is too high) in Python model training (simple and effective)

this paper mainly solves the problem that zero card of pytorch GPU occupies more video memory than other CARDS during model training. As shown in the figure below: the native GPU card is TITAN RTX, video memory is 24220M, batch_size = 9, and three CARDS are used. The 0th card video memory occupies 24207M. At this time, it just starts to run, and only a small amount of data is transferred to the video card. If the data is in multiple points, the video memory of the 0 card must burst. The reason why 0 card has higher video memory: During the back propagation of the network, the calculated gradient of loss is calculated on 0 card by default. So will be more than other graphics card some video memory, how much more specific, mainly to see the structure of the network.

as a result, in order to prevent training was interrupted due to out of memory. The foolhardy option is to set batch_size to 6, or 2 pieces of data per card.
batch_size = 6, the other the same, as shown in the figure below

have found the problem?Video memory USES only 1,2 CARDS and less than 16 gigabytes of memory. The batch_size is sacrificed because the 0 card might exceed a little bit of video memory.
so there’s no more elegant way?The answer is yes. That is borrowed from the transformer – xl BalancedDataParallel used in the class. The code is as follows (source) :

import torch
from torch.nn.parallel.data_parallel import DataParallel
from torch.nn.parallel.parallel_apply import parallel_apply
from torch.nn.parallel._functions import Scatter


def scatter(inputs, target_gpus, chunk_sizes, dim=0):
    r"""
    Slices tensors into approximately equal chunks and
    distributes them across given GPUs. Duplicates
    references to objects that are not tensors.
    """

    def scatter_map(obj):
        if isinstance(obj, torch.Tensor):
            try:
                return Scatter.apply(target_gpus, chunk_sizes, dim, obj)
            except Exception:
                print('obj', obj.size())
                print('dim', dim)
                print('chunk_sizes', chunk_sizes)
                quit()
        if isinstance(obj, tuple) and len(obj) > 0:
            return list(zip(*map(scatter_map, obj)))
        if isinstance(obj, list) and len(obj) > 0:
            return list(map(list, zip(*map(scatter_map, obj))))
        if isinstance(obj, dict) and len(obj) > 0:
            return list(map(type(obj), zip(*map(scatter_map, obj.items()))))
        return [obj for targets in target_gpus]

    # After scatter_map is called, a scatter_map cell will exist. This cell
    # has a reference to the actual function scatter_map, which has references
    # to a closure that has a reference to the scatter_map cell (because the
    # fn is recursive). To avoid this reference cycle, we set the function to
    # None, clearing the cell
    try:
        return scatter_map(inputs)
    finally:
        scatter_map = None


def scatter_kwargs(inputs, kwargs, target_gpus, chunk_sizes, dim=0):
    """Scatter with support for kwargs dictionary"""
    inputs = scatter(inputs, target_gpus, chunk_sizes, dim) if inputs else []
    kwargs = scatter(kwargs, target_gpus, chunk_sizes, dim) if kwargs else []
    if len(inputs) < len(kwargs):
        inputs.extend([() for _ in range(len(kwargs) - len(inputs))])
    elif len(kwargs) < len(inputs):
        kwargs.extend([{} for _ in range(len(inputs) - len(kwargs))])
    inputs = tuple(inputs)
    kwargs = tuple(kwargs)
    return inputs, kwargs


class BalancedDataParallel(DataParallel):

    def __init__(self, gpu0_bsz, *args, **kwargs):
        self.gpu0_bsz = gpu0_bsz
        super().__init__(*args, **kwargs)

    def forward(self, *inputs, **kwargs):
        if not self.device_ids:
            return self.module(*inputs, **kwargs)
        if self.gpu0_bsz == 0:
            device_ids = self.device_ids[1:]
        else:
            device_ids = self.device_ids
        inputs, kwargs = self.scatter(inputs, kwargs, device_ids)
        if len(self.device_ids) == 1:
            return self.module(*inputs[0], **kwargs[0])
        replicas = self.replicate(self.module, self.device_ids)
        if self.gpu0_bsz == 0:
            replicas = replicas[1:]
        outputs = self.parallel_apply(replicas, device_ids, inputs, kwargs)
        return self.gather(outputs, self.output_device)

    def parallel_apply(self, replicas, device_ids, inputs, kwargs):
        return parallel_apply(replicas, inputs, kwargs, device_ids)

    def scatter(self, inputs, kwargs, device_ids):
        bsz = inputs[0].size(self.dim)
        num_dev = len(self.device_ids)
        gpu0_bsz = self.gpu0_bsz
        bsz_unit = (bsz - gpu0_bsz) // (num_dev - 1)
        if gpu0_bsz < bsz_unit:
            chunk_sizes = [gpu0_bsz] + [bsz_unit] * (num_dev - 1)
            delta = bsz - sum(chunk_sizes)
            for i in range(delta):
                chunk_sizes[i + 1] += 1
            if gpu0_bsz == 0:
                chunk_sizes = chunk_sizes[1:]
        else:
            return super().scatter(inputs, kwargs, device_ids)
        return scatter_kwargs(inputs, kwargs, device_ids, chunk_sizes, dim=self.dim)

you can see, in the code BalancedDataParallel inherited the torch. The nn. DataParallel, through the custom after 0, the size of the card batch_size gpu0_bsz, namely 0 card a bit less data. Balance the memory usage of 0 CARDS with other CARDS. The invocation code is as follows:

import BalancedDataParallel

 if n_gpu > 1:
    model = BalancedDataParallel(gpu0_bsz=2, model, dim=0).to(device)
    # model = torch.nn.DataParallel(model)

gpu0_bsz: 0 card batch_size of GPU;
model: model;
dim: batch dimension

as a result, we might as well just batch_size set to 8, namely gpu0_bsz = 2 try, the results are as follows:

the batch_size from 6 to 8 of success, because 0 put a batch less, therefore, will be smaller than the other CARDS. But sacrificing the video memory of one card to the video memory of others, eventually increasing the batch_size, is still available. The advantages of this method are even more obvious when the number of CARDS is large.

Eo ferror: compressed file ended before the end of stream marker was reached

One of the reasons for the

error: at the time the data was first downloaded, the downloaded resource was incomplete due to network reasons or other reasons.
solution:

can delete the downloaded resources, download
to find the relevant resources, put the complete resources into the Dataset file (if you don’t know where the Dataset folder is, then the error information is prompted, you can go to the corresponding path to find.) Usually in your system’s home directory.

Pytorch RuntimeError: Error(s) in loading state_ dict for Dat aParallel:.. function submit.py Solutions for reporting errors

after the Image Retrieval training run today, some errors appeared when running subs.py like this:

RuntimeError: the Error (s) in loading state_dict for DataParallel: 
Missing key (s) in state_dict: “Module. Backbone. 0. Weight”, “module. Backbone. 1. The weight”, “module. Backbone. 1. The bias”, “module. Backbone. 1. Running_mean”, “module. Backbone. 1. Running_var”, “module. Backbone. 4.0. Conv1. Weight”. “Module. Backbone. 4.0. Bn1. Weight”, “module. Backbone. 4.0. Bn1. Bias”, “module. Backbone. 4.0. Bn1. Running_mean”, “module. Backbone. 4.0. Bn1. Running_var”, “module. Backbone. 4.0. Conv2. Weight”. “Module. Backbone. 4.0. Bn2. Weight”, “module. Backbone. 4.0. Bn2. Bias”, “module. Backbone. 4.0. Bn2. Running_mean”, “module. Backbone. 4.0. Bn2. Running_var”, “module. Backbone. 4.0. Conv3. Weight”. “Module. Backbone. 4.0. Bn3. Weight”, “module. Backbone. 4.0. Bn3. Bias”, “module. Backbone. 4.0. Bn3. Running_mean”, “module. Backbone. 4.0. Bn3. Running_var”, “module. Backbone. 4.0. Downsample. 0. Weight”. “Module. Backbone. 4.0. The downsample. 1. The weight”, “module. Backbone. 4.0. The downsample. 1. The bias”, “module. Backbone. 4.0. The downsample. 1. Running_mean”, “module. Backbone. 4.0. The downsample. 1. Running_var”, “Module. Backbone. 4.1. Conv1. Weight”, “module. Backbone. 4.1. Bn1. Weight”, “layer4.2. Bn3. Running_var”, “layer4.2. Bn3. Weight”, “layer4.2. Bn3. Bias”, “fc. Weight”, “fc. Bias,”… And so on.

is incorrectly located to line 95 and traced back to a path not generated by the training. PTH file.

solution:

find their own training after the file path is ok.

Here is the difference and connection of Torch. View (), Transpose (), and Permute ()

having recently been stretched by several of pytorch’s Tensor Tensor dimensional transformations, I delve into them, outlining their journey and their results as follows:

note: torch. The __version__ 1.2.0 ‘= =’

torch. Transpose () and the torch permute ()

and

are used to exchange content from different dimensions. Here, however, torch. () is exchange the content of two specified dimensions, and permute() can exchange more than one dimension all at once. Here is code:
(): the exchange of two dimensions

>>> a = torch.Tensor([[[1,2,3,4,5], [6,7,8,9,10], [11,12,13,14,15]], [[-1,-2,-3,-4,-5], [-6,-7,-8,-9,-10], [-11,-12,-13,-14,-15]]]) >>> a.shape torch.Size([2, 3, 5]) >>> print(a) tensor([[[ 1., 2., 3., 4., 5.], [ 6., 7., 8., 9., 10.], [ 11., 12., 13., 14., 15.]], [[ -1., -2., -3., -4., -5.], [ -6., -7., -8., -9., -10.], [-11., -12., -13., -14., -15.]]]) >>> b = a.transpose(1,2) # 使用transpose，将维度1和2进行交换。这个很好理解。转换后tensor与其shape如下 >>> print(b, b.shape) (tensor([[[ 1., 6., 11.], [ 2., 7., 12.], [ 3., 8., 13.], [ 4., 9., 14.], [ 5., 10., 15.]], [[ -1., -6., -11.], [ -2., -7., -12.], [ -3., -8., -13.], [ -4., -9., -14.], [ -5., -10., -15.]]]), torch.Size([2, 5, 3])))

permute() : does an arbitrary dimension swap

at once

>>> c = a.permute(2, 0, 1) >>> print(c, c.shape) # 此举将原维度0,1,2的次序变为2,1,0，所以shape也发生了相应的变化。 (tensor([[[ 1., 6., 11.], [ -1., -6., -11.]], [[ 2., 7., 12.], [ -2., -7., -12.]], [[ 3., 8., 13.], [ -3., -8., -13.]], [[ 4., 9., 14.], [ -4., -9., -14.]], [[ 5., 10., 15.], [ -5., -10., -15.]]]), torch.Size([5, 2, 3]))

This transformation between

transpose() and permute() :

>>> b = a.permute(2,0,1) >>> c = a.transpose(1,2).transpose(0,1) >>> print(b == c, b.shape) (tensor([[[True, True, True], [True, True, True]], [[True, True, True], [True, True, True]], [[True, True, True], [True, True, True]], [[True, True, True], [True, True, True]], [[True, True, True], [True, True, True]]]), torch.Size([5, 2, 3]))

as shown in the code, if you swap the first and second dimensions for Tensor a, and then swap the first and second dimensions for Tensor a, then they will get the same result as permute.

transpose () and the view ()

view() is a very common function in pytorch. This function also ACTS as an Tensor dimension, but does all this in a very different way from Transpose ()/permute(). If tranpose() is the Tensor whose original dimensions are exchanged faithfully, then view() is much more straightforward and simple — first, the view() function flattens all the Tensor dimensions into one, and then reconstructs an Tensor based on the incoming dimension information. Code is as follows:

# 还是上面的Tensor a >>> print(a.shape) torch.Size([2, 3, 5]) >>> print(a.view(2,5,3)) tensor([[[ 1., 2., 3.], [ 4., 5., 6.], [ 7., 8., 9.], [ 10., 11., 12.], [ 13., 14., 15.]], [[ -1., -2., -3.], [ -4., -5., -6.], [ -7., -8., -9.], [-10., -11., -12.], [-13., -14., -15.]]]) >>> c = a.transpose(1,2) >>> print(c, c.shape) (tensor([[[ 1., 6., 11.], [ 2., 7., 12.], [ 3., 8., 13.], [ 4., 9., 14.], [ 5., 10., 15.]], [[ -1., -6., -11.], [ -2., -7., -12.], [ -3., -8., -13.], [ -4., -9., -14.], [ -5., -10., -15.]]]), torch.Size([2, 5, 3]))

is shown in the code. Even though view() and () end up doing the same thing, their contents are not the same. The view function is just going to be applied to the Tensor dimensions of (2,5,3), which are going to be applied to the elements and ; All this does is do of the first second dimension.

Moreover, there are cases where the Tensor after transpose cannot be called view, because the Tensor after transpose is not “continuous” (non-self-help). The question about self-help array is the same in numpy, we have a great explanation here for

This entry was posted in Python and tagged Linear algebra, mathematics, python, pytorch on 2020-09-24 by Robins.

Post navigation

Newer posts →

ProgrammerAH

Programmer Guide, Tips and Tutorial