Tag Archives: pytorch

[Solved] RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place

yolov5 Error: RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place

Solution:
In model/yolo.py file

        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            b[:, 4] += math.log(8/(640/s) ** 2)  # obj (8 objects per 640 image)
            b[:, 5:] += math.log(0.6/(m.nc - 0.99)) if cf is None else torch.log(cf/cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

Add with torch.no_grad(): as follows

        for mi, s in zip(m.m, m.stride):  # from
            b = mi.bias.view(m.na, -1)  # conv.bias(255) to (3,85)
            with torch.no_grad():
                b[:, 4] += math.log(8/(640/s) ** 2)  # obj (8 objects per 640 image)
                b[:, 5:] += math.log(0.6/(m.nc - 0.99)) if cf is None else torch.log(cf/cf.sum())  # cls
            mi.bias = torch.nn.Parameter(b.view(-1), requires_grad=True)

——> The root cause is to add:

with torch.no_grad():

[Solved] RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation

source code

	def anim(i):
		# update SMBLD
		cur_beta_idx, cur_step = i // num_steps, i % num_steps
		val = shape_range[cur_step]
		mesh.multi_betas[0, cur_beta_idx] = val  # Update betas
		fig.suptitle(f"{name.title()}\nS{cur_beta_idx} : {val:+.2f}", fontsize=50)  # update text

		return dict(mesh=mesh.get_meshes(), equalize=False)


Modified code

Add with torch.no_grad(): will be OK!

	def anim(i):
		# update SMBLD
		cur_beta_idx, cur_step = i // num_steps, i % num_steps
		val = shape_range[cur_step]
		#print("\ncur_beta_idx:",cur_beta_idx,mesh.multi_betas[0, cur_beta_idx])
		with torch.no_grad():###添加
			mesh.multi_betas[0, cur_beta_idx] = val  # Update betas
		fig.suptitle(f"{name.title()}\nS{cur_beta_idx} : {val:+.2f}", fontsize=50)  # update text

		return dict(mesh=mesh.get_meshes(), equalize=False)

How to Solve Pytorch eval Stuck Error

Question

The single card training is very fast. When it comes to eval, it doesn’t move after running a batch, and there is no error.

Tried, still not moving

1, change the pin_memory of valid_loader to False. if it is True, it will automatically load the data into pin_memory, which speeds up the data
transfer speed to GPU.
2, change num_workers to 1, some people say too many workers may lead to multi-process interlock, can reduce or not

 

Final Solution:

valid_loader:
pin_memory = true # this is very important. Before, people on the Internet said that changing false might solve the problem. My experiment proved that if you do not work, you can run normally by changing back to true.
num_workers=4
batch_size=8

train_loader:
pin_memory=True
num_workers=4
batch_size = 8
these parameters are the same as valid_loader

In general, first of all, the pin_memory of valid_loader is kept True, which is well understood, the data is automatically loaded into pin_memory, which speeds up the data transfer to the GPU and naturally speeds up the inference process. Then, the number of workers and batch_size is reduced, and both valid_loader and train_loader are reduced. pin_memory of train_loader is also kept True.

[Solved] RuntimeError: DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

Question

RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:76] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

Today, when running yoov7 on my own computer, I used the CPU to run the test model because I didn’t have a GPU. I used the CPU to predict an independent image. There is no problem running an image. It is very nice!!! However, when I predict a video (multiple images), he told me that the memory allocation was insufficient,

DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.,

Moreover, it does not appear after the second image is run. It appears when the 17th image is calculated. The memory can not be released several times later~~~~~~~~

analysis

In pytorch, a tensor has a requires_grad parameter, which, if set to True, is automatically derived when backpropagating the tensor. tensor’s requires_grad property defaults to False, and if a node (leaf variable: tensor created by itself) requires_grad is set to True, then all nodes that depend on it require_grad to be True (even if other dependent tensors have requires_grad = False). grad is set to True, then all the nodes that depend on it will have True (even if the other tensor’s requires_grad = False)


Note:

requires_grad is a property of the generic data structure Tensor in Pytorch, which is used to indicate whether the current quantity needs to retain the corresponding gradient information in the calculation. Taking linear regression as an example, it is easy to know that the weights w and deviations b are the objects to be trained, and in order to get the most suitable parameter values, we need to set a relevant loss function, based on the idea of gradient back propagation Perform training.

When requires_grad is set to False, the backpropagation is not automatically derivative, so it saves memory or video memory.

Then the solution to this problem follows, just let the model not record the gradient during the test, because it is not really used.

 

Solution:

Use with torch.no_grad(), let the model not save the gradient during the test:

with torch.no_grad():
    output, _ = model(image) # Add before the image calculation

In this way, when the model calculates each image, the derivative will not be obtained and the gradient will not be saved!

Perfect solution!

[ncclUnhandledCudaError] unhandled cuda error, NCCL version xx.x.x

Problem description

Problems encountered during distributed training

RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:47, unhandled cuda error, NCCL version 21.0.3
ncclUnhandledCudaError: Call to CUDA function failed.

The specific errors are as follows:

 

Problem-solving

According to the analysis of error reporting information, an error is reported during initialization during distributed training, not during training. Therefore, the problem is located on the initialization of distributed training.

Enter the following command to check the card of the current server

nvidia-smi -L

The first card found is 3070

GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 1: NVIDIA GeForce RTX 3070 (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 4: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 5: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 6: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 7: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)

Therefore, here I directly try to use 2-7 cards for training.

Correct solution!

RuntimeError: stack expects each tensor to be equal size, but got [x] at entry 0 and [x] at entry 1

RuntimeError: stack expects each tensor to be equal size, but got [x] at entry 0 and [x] at entry 1

Problem description: When generating a dataloader, the training set can be run, but the test set has this error: RuntimeError: stack expects each tensor to be equal size, but got [200] at entry 0 and [116] at entry 1.

How to Solve: In generating the dataloader, I need to generate a dataset, so my error occurred because there is a minibatch in the dataset with a different number of data than the other minibatch, so I went into the custom dataset method to check, and through print debugging, I found that it was a problem with the dataset label.

Solution: Go into the dataset and print the output of the dataset.

[Solved] RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Background:

Use a graphics card in the ubuntu18.04 system geforce RTX 3090 to reproduce r2c


problem

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Cause analysis:

The graphics card geforce RTX 3090 only supports versions of cuda11 and above.


Solution:

Update pytorch and CUDA versions:

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

[Solved] Yolov5 Deep Learning Error: RuntimeError: DataLoader worker (pid(s) 2516, 1768) exited unexpectedly

Project scenario:

There is a problem when using yolov5 for deep learning. I use GPU for learning.


Problem description

An error is reported at the beginning of learning, RuntimeError: DataLoader worker (pid(s) 2516, 1768) exited unexpectedly.


Cause analysis:

Because I use GPU to learn, Anaconda’s virtual memory is also allocated enough, so the problem should be the setting of the number of CPU threads. Before that, I tried to adjust the batch size, but it didn’t work.


Solution:

In train There is a parameter of --workers in the file of py.

There is a parameter named --workers in the train.py file. Set it to 0.

the following is my setting, you can refer to it~~~

def parse_opt(known=False):
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default=ROOT/'yolov5x.pt', help='initial weights path') #初始权重值
    parser.add_argument('--cfg', type=str, default='yolov5_Scan_FDDI/PLC_model.yaml', help='model.yaml path') #训练模型文件
    parser.add_argument('--data', type=str, default=ROOT/'yolov5_Scan_FDDI/PLC_parameter.yaml', help='dataset.yaml path') #数据集参数文件
    parser.add_argument('--hyp', type=str, default=ROOT/'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path') #超参数设置
    parser.add_argument('--epochs', type=int, default=100) #训练轮数
    parser.add_argument('--batch-size', type=int, default=4, help='total batch size for all GPUs, -1 for autobatch') #batch size
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=320, help='train, val image size (pixels)') #图片大小
    parser.add_argument('--rect', action='store_true', help='rectangular training')
    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training') #断续训练
    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
    parser.add_argument('--noval', action='store_true', help='only validate final epoch')
    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
    parser.add_argument('--noplots', action='store_true', help='save no plot files')
    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in "ram" (default) or "disk"')
    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
    parser.add_argument('--device', default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') #GPU
    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')
    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
    parser.add_argument('--workers', type=int, default=0, help='max dataloader workers (per RANK in DDP mode)') #CPU线程数设置
    parser.add_argument('--project', default=ROOT/'runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--quad', action='store_true', help='quad dataloader')
    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
    parser.add_argument('--seed', type=int, default=0, help='Global training seed')
    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')

    # Weights & Biases arguments
    parser.add_argument('--entity', default=None, help='W&B: Entity')
    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')

    opt = parser.parse_known_args()[0] if known else parser.parse_args()
    return opt

[Solved] Pytorch Error: PytorchStreamReader failed reading zip archive failed finding central directory

Pytoch reports an error:

PytorchStreamReader failed reading zip archive: failed finding central directory

Error reporting position

An error is reported if the pre training model is not downloaded

resnet101 = torchvision.models.resnet101(pretrained=True)

Solution:

Delete the file C:\Users\Username/.cache\torch\hub\checkpoints.pth

[Solved] RuntimeError: cuda runtime error (801) : operation not supported at

cuda runtime error (801) : Raw out

Error:
RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245 #85

Reason:
Guess, windows does not support multitasking

Solution:

    layer_loader = NeighborSampler(data.adj_t, node_idx=None, sizes=[-1], batch_size=4096, shuffle=False, num_workers=12)

For example, the above code

Delete numwork directly

layer_loader = NeighborSampler(data.adj_t, node_idx=None, sizes=[-1], batch_size=4096, shuffle=False)

 

[Solved] ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

1. question

Using pytorch dataloader in docker may cause the following errors:

2. solution

View disk usage through df -h in docker:

You can see that /dev/shm is only 64M, but the data_loader has more num_works set, and it is collaborating through shared memory, resulting in insufficient memory.

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with –ipc=host or –shm-size command line options to nvidia-docker run.

Solution:
(1) num_workers=0 (note that setting it to 1 does not work)
(2) docker is easy to share more memory:

--ipc=host  or --shm-size 8G
where -ipc=host will be adjusted according to the current host memory maximum, it is recommended to use this method

After restart: