Tag Archives: Pytorch error

[Solved] PyTorch Error: TypeError: exceptions must derive from BaseException

Project scenario:

PyTorch reports an error: TypeError: exceptions must deliver from BaseException


Problem description

In base_options.py, set the –netG parameters to be selected only from these.

self.parser.add_argument('--netG', type=str, default='p2hed', choices=['p2hed', 'refineD', 'p2hed_att'], help='selects model to use for netG')

However, when selecting netG, the code is written as follows:

def define_G(input_nc, output_nc, ngf, netG, n_downsample_global=3, n_blocks_global=9, n_local_enhancers=1, 
             n_blocks_local=3, norm='instance', gpu_ids=[]):    
    norm_layer = get_norm_layer(norm_type=norm)     
    if netG == 'p2hed':    
        netG = DDNet_p2hED(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, norm_layer)
    elif netG == 'refineDepth':
        netG = DDNet_RefineDepth(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    elif netG == 'p2h_noatt':        
        netG = DDNet_p2hed_noatt(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    else:
        raise('generator not implemented!')
    #print(netG)
    if len(gpu_ids) > 0:
        assert(torch.cuda.is_available())   
        netG.cuda(gpu_ids[0])
    netG.apply(weights_init)
    return netG

Cause analysis:

Note that there is no option of ‘rfineD’, so when running the code, the program cannot find the network that netG should select, so it reports an error.


Solution:

In fact, change the “elif netG==’refineDepth’:”  to “elif netG==’refineD’:”. it will be OK!

[Solved] Pytorch Error: RuntimeError: expected scalar type Double but found Float

Problem description:

This error occurs when LSTM is used for data training, I convert the numpy data directly to the tensor data type in the torch:

RuntimeError: expected scalar type Double but found Float

Cause analysis:

The data type of the tensor is incorrect

x_train_tensor = torch.from_numpy(x_train)
y_train_tensor = torch.from_numpy(y_train)

Solution:

Convert the original tensor to the torch.float32 type

x_train_tensor = torch.from_numpy(x_train).to(torch.float32)
y_train_tensor = torch.from_numpy(y_train).to(torch.float32)

[Solved] TensorFlow Error: UnknownError (see above for traceback): Failed to get convolution algorithm.

[Python/Pytorch – Bug] –UnknownError (see above for traceback): Failed to get convolution algorithm.

 

Question

Problem: TensorFlow reports an error: unknown error (see above for traceback): failed to get revolution algorithm

 

analysis

Analysis: the reason is that the memory of the graphics card is not enough. Selecting the appropriate memory of the graphics card can solve the problem.

 

Solution:
1. Gpustat checks the usage of the graphics card
2. Select a graphics card with enough memory;

[Solved] Pytorch Error: RuntimeError: Error(s) in loading state_dict for Network: size mismatch

Problem background

GitHub open source project: https://github.com/zhang-tao-whu/e2ec

python train_net.py coco_finetune --bs 12 \
--type finetune --checkpoint data/model/model_coco.pth

The error is reported as follows:

loading annotations into memory...
Done (t=0.09s)
creating index...
index created!
load model: data/model/model_coco.pth
Traceback (most recent call last):
  File "train_net.py", line 67, in <module>
    main()
  File "train_net.py", line 64, in main
    train(network, cfg)
  File "train_net.py", line 40, in train
    begin_epoch = load_network(network, model_dir=args.checkpoint, strict=False)
  File "/root/autodl-tmp/e2ec/train/model_utils/utils.py", line 66, in load_network
    net.load_state_dict(net_weight, strict=strict)
  File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Network:
        size mismatch for dla.ct_hm.2.weight: copying a param with shape torch.Size([80, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([1, 256, 1, 1]).
        size mismatch for dla.ct_hm.2.bias: copying a param with shape torch.Size([80]) from checkpoint, the shape in current model is torch.Size([1]).

Since my own dataset has only 1 category, while the COCO dataset has 80 categories, the size of the dla.ct_hm.2 parameter in the pre-training model does not match mine, so the weight of this parameter in the pre-training model needs to be discarded.

Solution:

Modify in e2ec/train/model_utils/utils.py:

def load_network(net, model_dir, strict=True, map_location=None):

    if not os.path.exists(model_dir):
        print(colored('WARNING: NO MODEL LOADED !!!', 'red'))
        return 0

    print('load model: {}'.format(model_dir))
    if map_location is None:
        pretrained_model = torch.load(model_dir, map_location={'cuda:0': 'cpu', 'cuda:1': 'cpu',
                                                               'cuda:2': 'cpu', 'cuda:3': 'cpu'})
    else:
        pretrained_model = torch.load(model_dir, map_location=map_location)
    if 'epoch' in pretrained_model.keys():
        epoch = pretrained_model['epoch'] + 1
    else:
        epoch = 0
    pretrained_model = pretrained_model['net']

    net_weight = net.state_dict()
    for key in net_weight.keys():
        net_weight.update({key: pretrained_model[key]})
    '''
	Discard some parameters
	'''
    net_weight.pop("dla.ct_hm.2.weight")
    net_weight.pop("dla.ct_hm.2.bias")
    
    net.load_state_dict(net_weight, strict=strict)
    return epoch

Note: setting strict=False in load_state_dict is only useful for adding or removing partial layers, not for changing the dimension size on the original parameters.

[Solved] Pytorch Error: PytorchStreamReader failed reading zip archive failed finding central directory

Pytoch reports an error: pytochstreamreader failed reading zip archive: failed finding central directory

Error reporting position

An error is reported if the pre training model is not downloaded

resnet101 = torchvision.models.resnet101(pretrained=True)

Solution:

Download the files from the above URL and put them in the location of the path behind to replace the weights that have not been downloaded

[Solved] Pytorch Error: RuntimeError: Trying to backward through the graph a second time

During Gan’s training, we often encounter this problem: runtimeerror: trying to backward through the graph a second time, but the saved intermediate results have already been free

The description of this error is that when you are propagating in the direction, the cache is released in advance. Many solutions are to change loss. Backward() to loss. Backward (retain_graph = true). In fact, most of the code problems are not here, and the retention of the calculation chart may cause the cache to accumulate rapidly and lead to the explosion of the video memory, In fact, the real reason is that the discriminator and the producer share the same variables. Just add detach () when using variables for the first time.

#### Update Discriminator ###
real_preds = netD(real_gt)
fake_preds = netD(fake_coarse)


#### Update Generator #####
real_preds = netD(real_gt)
fake_preds = netD(fake_coarse)

Change to:

#### Update Discriminator ###
real_preds = netD(real_gt.detach())
fake_preds = netD(fake_coarse.detach())


#### Update Generator #####
real_preds = netD(real_gt)
fake_preds = netD(fake_coarse)

It’s not easy to find the method. Thank you for your support

Pytorch error: `module ‘torch‘ has no attribute ‘__version___‘`

Today, I configured the python environment on windows. After installing python, I checked whether it was installed successfully. There was no error in inputting import torch. However, I found that print (torch. Version) could not query torch. After careful examination, I found that two underscores were missing, and there was no error in inputting print (torch. Version). It was a false alarm.

raise ValueError(‘Expected input batch_size ({}) to match target batch_size ({}).‘

raise ValueError(‘Expected input batch_ size ({}) to match target batch_ size ({}).’

Remember to print the size of the picture before forward propagation. I didn’t notice that all the pictures come in RGB three channel data this time. When using the . View function, I remember to look at it first. When I used it, I calculated the size of the picture directly according to a single channel. Generally, this is the phenomenon that the size of the picture does not match

The phenomenon of mating

[Solved] volatile was removed and now has no effect. Use `with torch.no_grad():` instead.

Solution: volatile was removed and now has no effect. Use with torch.no_grad():instead.

Source code

self.priors = Variable(self.priorbox.forward(), volatile=True)

 

the reason

It volatilehas been removed in the torch version .
Before pytorch 0.4.0 input = Variable(input, volatile=True) set volatile to True, as long as an input is volatile, the output is also volatile, which can guarantee that there is no intermediate state; but canceled after pytorch 0.4.0 The volatile mechanism is replaced with functions such as torch.no_grad(), torch.set_grad_enable(grad_mode)
torch.no_grad() is a context manager.
When using pytorch , not all operations require the generation of calculation graphs (the construction of the calculation process to facilitate gradient back propagation and other operations). For the calculation operation of tensor, the default is to construct the calculation graph. In this case, you can use with torch.no_grad(): to force the subsequent content not to construct the calculation graph.
The torch.no_grad() will affect pytorch’s backpropagation mechanism. In the test, because it is determined that backpropagation will not be used, this mode can help save memory space. The same is true for torch.set_grad_enable(grad_mode)

change into

with torch.no_grad():
	self.priors = Variable(self.priorbox.forward())

or

self.priors = Variable(self.priorbox.forward())

Pytorch: How to Handle error warning conda.gateways.disk.delete:unlink_or_rename_to_trash(140)

I want to bring the version of Python back to 1.6.0, so I need to install it again.
Under the condition of Tsinghua image source, enter
CONDA install Torch = = 1.6.0 torch vision = = 0.7.0 in CONDA environment

However, the alarm during installation is as follows:

WARNING conda.gateways.disk . delete:unlink_ or_ rename_ to_ trash(140): Could not remove or rename D:\anaconda\pkgs\pytorch-1.6.0-py3.7_ cuda101_ cudnn7_ zero tar.bz2 . Please remove this file manually (you may need to reboot to free file handles)
WARNING conda.gateways.disk . delete:unlink_ or_ rename_ to_ trash(140): Could not remove or rename D:\anaconda\pkgs\pytorch-1.6.0-py3.7_ cuda101_ cudnn7_ 0\Lib\site-packages\torch\lib\torch_ cuda.dll . Please remove this file manually (you may need to reboot to free file handles)

This solution has been referred to here, but it can’t be solved, but I also open the permission.
Later, simply follow the prompts and manually set the_ cuda101_ cudnn7_ zero tar.bz2 And “D: anaconda, Pkgs, pytorch-1.6.0-py3.7″_ cuda101_ cudnn7_ 0\Lib\site-packages\torch\lib\torch_ cuda.dll The file referred to by “.” will be deleted and no error will be reported.
Finally, the installation is successful and the version number is displayed

import torch
print(torch.__version__)  #Note the double underscore