Tag Archives: pytorch

[Solved] PyTorch Error: TypeError: exceptions must derive from BaseException

Project scenario:

PyTorch reports an error: TypeError: exceptions must deliver from BaseException


Problem description

In base_options.py, set the –netG parameters to be selected only from these.

self.parser.add_argument('--netG', type=str, default='p2hed', choices=['p2hed', 'refineD', 'p2hed_att'], help='selects model to use for netG')

However, when selecting netG, the code is written as follows:

def define_G(input_nc, output_nc, ngf, netG, n_downsample_global=3, n_blocks_global=9, n_local_enhancers=1, 
             n_blocks_local=3, norm='instance', gpu_ids=[]):    
    norm_layer = get_norm_layer(norm_type=norm)     
    if netG == 'p2hed':    
        netG = DDNet_p2hED(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, norm_layer)
    elif netG == 'refineDepth':
        netG = DDNet_RefineDepth(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    elif netG == 'p2h_noatt':        
        netG = DDNet_p2hed_noatt(input_nc, output_nc, ngf, n_downsample_global, n_blocks_global, n_local_enhancers, n_blocks_local, norm_layer)
    else:
        raise('generator not implemented!')
    #print(netG)
    if len(gpu_ids) > 0:
        assert(torch.cuda.is_available())   
        netG.cuda(gpu_ids[0])
    netG.apply(weights_init)
    return netG

Cause analysis:

Note that there is no option of ‘rfineD’, so when running the code, the program cannot find the network that netG should select, so it reports an error.


Solution:

In fact, change the “elif netG==’refineDepth’:”  to “elif netG==’refineD’:”. it will be OK!

torchvision.dataset Failed to Download CIFAR10 Error [How to Solve]

An error occurred while using dataset to download the dataset

urllib.error.URLError:urlopen error unknown url type:https 

Considering that there is no import ssl, add the following command

**import ssl
ssl._create_default_https_context = ssl._create_unverified_context**

Run again to import ssl

import ssl report an error: DLL load fail error

Solution:

First, configure the environment variable, find the current python installation directory, and add the following three paths to the PATH of the system variable

**E:\Anaconda3\envs\pytorch;      #python.exe所在路径
  E:\Anaconda3\envs\pytorch\Scripts;		
  E:\Anaconda3\envs\pytorch\Library\bin**

Then find the files libcrypto-1_1.dll and libssl-1_1.dll in the bin folder and copy them to the DLLs path.

This solves the download problem

[Solved] ByteTrack Error: ModuleNotFoundError: No module named ‘yolox’

1. Error Message:

File "tools/demo_track.py", line 10, in <module>
from yolox.data.data_augment import preproc
ModuleNotFoundError: No module named 'yolox'

2. Reason

Although the yolox folder exists under the project file, it cannot be called without the yolox library installed.

3. Solution
3.1 Answer from the original author

First of, please make sure you decide for a version of CUDA and consistently use that; I am using 11.3 in this.
I fixed this and many other installation and compilation errors, by uninstalling and re-installing the following programs in the exact order

  1. Clone the yolox repo and unzip it
  2. Install Virtual Studio 2019 Community (https://visualstudio.microsoft.com/downloads/)
  3. Download CUDA https://developer.nvidia.com/cuda-11.3.0-download-archive (I just did express installation)
  4. Get https://docs.conda.io/en/latest/miniconda.html for your version of python
  5. Install pytorch with cuda enabled conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
  6. Navigate conda to the download directory (cd yolox_path) of yolox and type in:
    pip install -r requirements.txt
    pip install pycocotools # this should get added to requirements.txt @FateScript
    pip install -v -e . # or python setup.py develop
  7. Congratulations you fixed the error, now you’ll be able to run yolox as described in Quick Start > Demo (example: python tools/demo.py video -n yolox-s -c /path/to/your/yolox_s.pth –path /path/to/your/video –conf 0.25 –nms 0.45 –tsize 640 –save_result –device [cpu/gpu] )

A couple notices:

  • You can at the time of writing this; not install above CUDA 11.3, because conda does not provide a higher version in sources to compile with pytorch
  • You can not install a higher version of Virtual Studio, because of incompability with CUDA (devs did not add support for MSVS22 yet)
  • You’re forced to install MSVS; because this repo depends on it, to be able to compile as written in step 6.
  • You can not simply uninstall conda, because it removes its CUDA compiled pytorch version and that in return breaks yolox. But I think you could most likely avoid this

In short you kept getting this error, because you couldn’t compile yolox properly or not at all.

3.2 Summary

Add pycocotools in requirements.txt as below:

Run pip install -r requirements.txt

Run pip install -v -e . Or python setup.py develop command
The result after running.


Run Successfully!

 

Reference:

    ModuleNotFoundError: No module named ‘yolox’ ??how can i resolve it ?please!

[Solved] PyTorch Lightning Error: KeyError: ‘hidden_states‘

How to Solve PyTorch Lightning error KeyError: ‘hidden_ states’

Problem description: PyTorch Lightning error: KeyError: ‘hidden_ states’.

model = BertModel.from_pretrained('bert-base-uncased')

Solution: add a parameter after the above code, config=BertConfig.from_pretrained(‘bert-base-uncased’,output_hidden_states=True), as below:

model = BertModel.from_pretrained('bert-base-uncased', config=BertConfig.from_pretrained('bert-base-uncased',output_hidden_states=True))

[Solved] RuntimeError: NCCL error in: XXX, unhandled system error, NCCL version 2.7.8

Project scenario:

This problem is encountered in distributed training,


Problem description

Perhaps parallel operation is not started???(


Solution:

(1) First, check the server GPU related information. Enter the pytorch terminal to enter the code

python
torch.cuda.is_available()# to see if cuda is available.
torch.cuda.device_count()# to see the number of gpu's.
torch.cuda.get_device_name(0)# to see the gpu name, the device index starts from 0 by default.
torch.cuda.current_device()# return the current device index.

Ctrl+Z Exit
(2) cd enters the upper folder of the file to be run

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6 #启动并行运算

Plus files to run and related configurations

 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5 python -m torch.distributed.launch --nproc_per_node=6  src_nq/create_examples.py --vocab_file ./bert-base-uncased-vocab.txt \--input_pattern "./natural_questions/v1.0/train/nq-train-*.jsonl.gz" \--output_dir ./natural_questions/nq_0.03/\--do_lower_case \--num_threads 24 --include_unknowns 0.03 --max_seq_length 512 --doc_stride 128

Problem-solving!

[Solved] Pytorch Error: RuntimeError: expected scalar type Double but found Float

Problem description:

This error occurs when LSTM is used for data training, I convert the numpy data directly to the tensor data type in the torch:

RuntimeError: expected scalar type Double but found Float

Cause analysis:

The data type of the tensor is incorrect

x_train_tensor = torch.from_numpy(x_train)
y_train_tensor = torch.from_numpy(y_train)

Solution:

Convert the original tensor to the torch.float32 type

x_train_tensor = torch.from_numpy(x_train).to(torch.float32)
y_train_tensor = torch.from_numpy(y_train).to(torch.float32)

[Solved] AttributeError: module ‘distutils‘ has no attribute ‘version‘

mmyolo + tensorboard failed to start error:

File "D:\Anaconda3\envs\mmyo\lib\site-packages\mmengine\visualization\vis_backend.py", line 495, in _init_env
    from torch.utils.tensorboard import SummaryWriter
  File "D:\Anaconda3\envs\mmyo\lib\site-packages\torch\utils\tensorboard\__init__.py", line 4, in <module>
    LooseVersion = distutils.version.LooseVersion
AttributeError: module 'distutils' has no attribute 'version'

Reason:

The version of setuptools is too higher.

Solution:
Install the lower version of setuptools via the following command:
 

pip install setuptools==56.1.0

[Solved] AttributeError: ‘HTMLWriter‘ object has no attribute ‘_temp_names‘

Error Message (Error 1):

TypeError: render() got an unexpected keyword argument ‘mode‘

Solution for Error1:

Tried setting gym and pyglet to

  • gym:0.17.1
  • pyglet:1.5.0

Note: This method will solve the problem above.

However, An new error (Error 2) will be reported:

AttributeError: ‘HTMLWriter’ object has no attribute ‘_temp_names’

Solution for Error2:

  • Open the .py file where you wrote your code (the same file you wrote your code in)
  • Find your animate_frames method. (If you don’t have it, you can ignore it, I don’t have it, then just put the first block of code from step 3 at the top)
  • Add the code before the animate_frames method (add the package to the top).
import matplotlib.pyplot as plt
from IPython.display import HTML

def display_animation(anim):
    plt.close(anim._fig)
    return HTML(anim.to_jshtml())

Find the following code:

display(display_animation(anim, default_mode='XXX'))

Change it to:

display(display_animation(anim))

The following code can be deleted or ignored:

from JSAnimation.IPython_display import display_animation

[Solved] RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place

yolov5 Error: RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place

Solution:
In model/yolo.py file

        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            b[:, 4] += math.log(8/(640/s) ** 2)  # obj (8 objects per 640 image)
            b[:, 5:] += math.log(0.6/(m.nc - 0.99)) if cf is None else torch.log(cf/cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

Add with torch.no_grad(): as follows

        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            with torch.no_grad():
                b[:, 4] += math.log(8/(640/s) ** 2)  # obj (8 objects per 640 image)
                b[:, 5:] += math.log(0.6/(m.nc - 0.99)) if cf is None else torch.log(cf/cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

——> The root cause is to add:

with torch.no_grad():

[Solved] RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation

source code

	def anim(i):
		# update SMBLD
		cur_beta_idx, cur_step = i // num_steps, i % num_steps
		val = shape_range[cur_step]
		mesh.multi_betas[0, cur_beta_idx] = val  # Update betas
		fig.suptitle(f"{name.title()}\nS{cur_beta_idx} : {val:+.2f}", fontsize=50)  # update text

		return dict(mesh=mesh.get_meshes(), equalize=False)


Modified code

Add with torch.no_grad(): will be OK!

	def anim(i):
		# update SMBLD
		cur_beta_idx, cur_step = i // num_steps, i % num_steps
		val = shape_range[cur_step]
		#print("\ncur_beta_idx:",cur_beta_idx,mesh.multi_betas[0, cur_beta_idx])
		with torch.no_grad():###添加
			mesh.multi_betas[0, cur_beta_idx] = val  # Update betas
		fig.suptitle(f"{name.title()}\nS{cur_beta_idx} : {val:+.2f}", fontsize=50)  # update text

		return dict(mesh=mesh.get_meshes(), equalize=False)

How to Solve Pytorch eval Stuck Error

Question

The single card training is very fast. When it comes to eval, it doesn’t move after running a batch, and there is no error.

Tried, still not moving

1, change the pin_memory of valid_loader to False. if it is True, it will automatically load the data into pin_memory, which speeds up the data
transfer speed to GPU.
2, change num_workers to 1, some people say too many workers may lead to multi-process interlock, can reduce or not

 

Final Solution:

valid_loader:
pin_memory = true # this is very important. Before, people on the Internet said that changing false might solve the problem. My experiment proved that if you do not work, you can run normally by changing back to true.
num_workers=4
batch_size=8

train_loader:
pin_memory=True
num_workers=4
batch_size = 8
these parameters are the same as valid_loader

In general, first of all, the pin_memory of valid_loader is kept True, which is well understood, the data is automatically loaded into pin_memory, which speeds up the data transfer to the GPU and naturally speeds up the inference process. Then, the number of workers and batch_size is reduced, and both valid_loader and train_loader are reduced. pin_memory of train_loader is also kept True.