Tag Archives: Deep learning

[Solved] Keras Error: KeyError: ‘accuracy‘, KeyError: ‘val_acc‘, KeyError: ‘acc‘

Problem:
keyerror ‘ACC’ is reported when using keras

Reason:
this is a keras version problem. ACC and accuracy are intended to be the same, but different keras versions use different names, so they need to be replaced. val_ So is acc.

Solution:
Print history keyword
Print (history. History. Keys())
change the error part to the printed “K” and “V”“

AttributeError: ‘NoneType‘ object has no attribute ‘shape‘

When training poly-yolo, the environment is debugged, but this bug will appear in the training. Generally, the picture path is not set or set incorrectly.

solve:

In the simulator_ Absolute path is added in front of each line of image in dataset/simulator-train.txt file, such as:

/home/arl/lc/poly-yolo-master/simulator_ dataset/imgs/img_ r_ 12.png

The validation set is the same as above

[Solved] Tensorflow Error: failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED

Tensorflow failed to create cublas handle: cublas_ STATUS_ ALLOC_ FAILED

Foreword problem description problem solving reference link

preface

After many days of in-depth learning, I finally learned to use GPU. I was very happy, but I chatted with my classmates and learned that my 1660ti running in-depth learning is nothing. Dunton doesn’t hold any hope. It’s good to use notebooks for learning. If you really run in-depth learning, you have to use laboratory computers. Alas, there’s still no money

Problem description

An error occurred while using GPU

2021-11-09 20:43:26.114720: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_100.dll
2021-11-09 20:43:26.386261: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-11-09 20:43:26.386617: E tensorflow/stream_executor/cuda/cuda_blas.cc:238] failed to create cublas handle: CUBLAS_STATUS_ALLOC_FAILED
2021-11-09 20:43:26.386735: W tensorflow/stream_executor/stream.cc:1919] attempting to perform BLAS operation using StreamExecutor without BLAS support
Traceback (most recent call last):
  File "first.py", line 30, in <module>
    gpu_time = timeit.timeit(gpu_run,number=10)
  File "D:\Anaconda\Anaconda3\envs\tensorflow2_0_0_gpu\lib\timeit.py", line 233, in timeit
    return Timer(stmt, setup, timer, globals).timeit(number)
  File "D:\Anaconda\Anaconda3\envs\tensorflow2_0_0_gpu\lib\timeit.py", line 177, in timeit
    timing = self.inner(it, self.timer)
  File "<timeit-src>", line 6, in inner
  File "first.py", line 21, in gpu_run
    c = tf.matmul(gpu_a,gpu_b)
  File "D:\Anaconda\Anaconda3\envs\tensorflow2_0_0_gpu\lib\site-packages\tensorflow_core\python\util\dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "D:\Anaconda\Anaconda3\envs\tensorflow2_0_0_gpu\lib\site-packages\tensorflow_core\python\ops\math_ops.py", line 2765, in matmul
    a, b, transpose_a=transpose_a, transpose_b=transpose_b, name=name)
  File "D:\Anaconda\Anaconda3\envs\tensorflow2_0_0_gpu\lib\site-packages\tensorflow_core\python\ops\gen_math_ops.py", line 6126, in mat_mul
    _six.raise_from(_core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(10000, 1000), b.shape=(1000, 2000), m=10000, n=2000, k=1000 [Op:MatMul] name: MatMul/

I was in a hurry to find out the reason. I didn’t have enough video memory, and the GPU didn’t run full

Solution:

There are two main reasons
1. The versions of cudnn and CUDA and tensorflow are not applicable, but mine are based on the tutorial and confirmed several times to ensure that they are OK. This excludes the shortage of GPU video memory. It can be solved through the method on the official website: t because ensorflow 2.0 supports two GPU computing methods:
(1) dynamically allocate video memory
(2) set hard video memory (for example, only 1g video memory can be used, and others can play games
set the mode to (1) dynamic allocation, and the code is;

import tensorflow as tf

gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)

[Solved] TF2.4 Error: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize

First, check whether the CUDA version and cudnn version are aligned.

Version number view:

Note that CUDA indicates the minimum compatibility. For example, version 2.4 and above 11.0 are OK. My side is 11.5, and there is no problem

The error on my side is caused by insufficient video memory

For the error of insufficient video memory, add the following code.

import tensorflow as tf
config = tf.compat.v1.ConfigProto()
config.gpu_options.allow_growth = True
session = tf.compat.v1.InteractiveSession(config=config)

[Solved] pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle

Invalid resource handle error in pycuda code

When running CUDA code, the following error occurs:

File "/mnt/lustre/demo/extract_disp_newtopo/face_registration-master/code/poisson.py", line 147, in blend_gs_cuda
    block = (1024, 1, 1))
  File "/mnt/lustre/miniconda3/envs/pycuda/lib/python3.6/site-packages/pycuda/driver.py", line 436, in function_call
    func._set_block_shape(*block)
pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle

Solution:
find the CUDA function part with the problem:

mod = SourceModule("""
			#include <stdint.h>
			__global__ void construct_b(
				const uint8_t* src, const int16_t* u, const int16_t* v,
				float* b,
				int pix_num, int size
			)

You can add any sentence in the following code before the function declaration

src = torch.cuda.ByteTensor(8)# Fill in the numbers at will, and the matrix is also fine, but the data type should be consistent with the data type in C++
b   = torch.cuda.FloatTensor(9)

The complete code is as follows:

src = torch.cuda.ByteTensor(8)
mod = SourceModule("""
			#include <stdint.h>
			__global__ void construct_b(
				const uint8_t* src, const int16_t* u, const int16_t* v,
				float* b,
				int pix_num, int size
			)

The specific reason is unknown because it occurs occasionally.

tensorflow.python.framework.errors_impl.InternalError: Blas xGEMM launch failed

When running the image stylization code with tensorflow version 2.4.0, the following error occurred:

tensorflow.python.framework.errors_impl.InternalError: Blas xGEMM launch failed : a.shape=[1,480000,64], b.shape=[1,480000,64], m=64, n=64, k=480000 [Op:Einsum]

The following two solutions are found by consulting the data:
1. Add the following code to the program:

import os
os.environ['CUDA_VISIBLE_DEVICES'] = '/gpu:0'

The program can run normally, but the CPU is used, and the running speed of the program is much slower
2. Modify the cudnn version, but it is generally not recommended. It is too troublesome.

[Solved] Tensorflow error or keras error and tf.keras error: oom video memory is insufficient

Hint: if you want to see a list of allocated tenants when oom happens, add Report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Problem description

The problems encountered in today’s 50% off cross-validation and grid search are that the amount of data was too large or bitch_ It also occurs when the size is too large, as shown in the figure:
use the command: Watch – N 0.1 NVIDIA SMI in Linux to view the GPU usage

reason

Due to the lack of video memory, but it is not the real lack of video memory, but because TensorFlow has eaten up the video memory, but there is no actual effective utilization. Therefore, the required video memory can be allocated to TensorFlow. (keras based on TensorFlow is also applicable)

Solution:

1. Set small pitch_Size, although it can be used, the indicator does not cure the root cause
2. Manually set the GPU. In train.py:

(1) in tensorflow
import tensorflow as tf
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0" Specify which GPU to use
config = tf.ConfigProto()
config.gpu_options.allow_growth = True # Allocate video memory on demand
config.gpu_options.per_process_gpu_memory_fraction = 0.4 # Maximum memory usage 40%
session = tf.Session(config=config)) # Create tensorflow session
...
(2) in keras
import tensorflow as tf
from keras.models import Sequential
import os
from keras.backend.tensorflow_backend import set_session ## Different from tf.keras

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # Allocate video memory on demand
set_session(tf.Session(config=config)) # Pass the settings to keras

model = Sequential()
...
(3) in tf.keras
import tensorflow as tf
from tensorflow.keras.models import Sequential

import os
from tensorflow_core.python.keras.backend import set_session # Different from tf.keras

os.environ["CUDA_VISIBLE_DEVICES"] = "0"
config = tf.ConfigProto()
config.gpu_options.allow_growth = True  # Allocate video memory on demand
config.gpu_options.per_process_gpu_memory_fraction = 0.4 # use 40% of the maximum video memory
set_session(tf.Session(config=config)) # Pass the settings to tf.keras

model = Sequential()
...

Supplement:
tf.keras can use data reading multithreading acceleration:

model.fit(x_train,y_train,use_multiprocessing=True, workers=4) # Enable multithreading, using 4 CPUs

Empty session:

from tensorflow import keras
keras.backend.clear_session() 

After emptying, you can continue to create a new session

TensorRT model quantization error: Error Code 1: Cuda Runtime (an illegal memory access was encountered)

When using tensorrt for model quantization on A10 graphics card, the following error is reported.

[W] [TRT] Calibration Profile is not defined. Running calibration with Profile 0
[I] calib data processed : 0/4680batch
[E] [TRT] 1: [calibrator.cpp::add::779] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] [TRT] [executionContext.cpp::commonEmitDebugTensor::1258] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[E] [TRT] [executionContext.cpp::executeInternal::610] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
[F] [TRT] [defaultAllocator.cpp::free::85] Error Code 1: Cuda Runtime (an illegal memory access was encountered)
terminate called after throwing an instance of 'nvinfer1::CudaRuntimeError'
  what():  an illegal memory access was encountered
Aborted

This is because the architecture does not match. Add 86 to a10

Therefore, modify my makefile. Here is my makefile fragment, originally SMS= 60 61 62 70 72 75, I added 86

# Gencode arguments
SMS ?= 60 61 62 70 72 75 86

ifeq ($(GENCODE_FLAGS),)
# Generate SASS code for each SM architecture listed in $(SMS)
$(foreach sm,$(SMS),$(eval GENCODE_FLAGS += -gencode arch=compute_$(sm),code=sm_$(sm)))

urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

When training the model, load some pre training models, such as VGg. The code is as follows

model = torchvision.models.vgg19(pretrained=True)

Train will display

Downloading: "https://download.pytorch.org/models/vgg19-dcbb9e9d.pth" to /root/.cache/torch/checkpoints/vgg19-dcbb9e9d.pth

Then an error occurred:

socket.gaierror: [Errno -3] Temporary failure in name resolution
and
urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>

This is because the pre training model cannot be downloaded, so it needs to be downloaded from the Internet
therefore, it is more convenient to download the model first, find a way to connect to the Internet, and then input the link automatically https://download.pytorch.org/models/vgg19-dcbb9e9d.pth
Then put the downloaded. PTH model file under a fixed path, such as

/home/team/torch/models/pre_ model/vgg19-dcbb9e9d.pth

Finally, change the code to

model = torchvision.models.vgg19(pretrained=False)
pthfile = r'/home/team/torch/models/pre_model/vgg19-dcbb9e9d.pth'
model.load_state_dict(torch.load(pthfile))```

[Solved] RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors

Error reporting reason:

Probably because: the code has a place where the array is out of bounds. The blind guess is in the cross entropy loss function. I’m here anyway.

Small probability is another reason. However, the following solutions are generic.

Solution:
run device = “CPU” first. You can locate where the array is out of bounds and modify the code. Make sure it is correct before running on the GPU.

[Solved] Runtimeerror during dcgan training: found dtype long but expected float

When using dcgan for network training, the following errors occur:

RuntimeError: Found dtype Long but expected Float

The code snippet for this error is as follows:

label = torch.full((b_size,), real_label, device=device)
        # Input the batch with positive samples into the discriminant network for forward computation and put the result into the variable output
        output = netD(real_cpu).view(-1)
    
        # Calculate the loss
        errD_real = criterion(output, label)

The reason is that the data type of the input output data and tag value into the loss function does not match the required data type. What is required is float type data, and what is passed in is long type data
therefore, we need to convert the incoming data to float type
the modified code is as follows:

label = torch.full((b_size,), real_label, device=device)
        # Input the batch with positive samples into the discriminant network for forward computation and put the result into the variable output
        output = netD(real_cpu).view(-1)
        # Convert the incoming data to float type
        output = output.to(torch.float32)
        label = label.to(torch.float32)
        # Calculate the loss
        errD_real = criterion(output, label)

Problem solved!

[Solved] RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of s

catalogue

An error is reported when running unet3 + for multi classification training

RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size xxx

1. Modify train.py

2. Modify predict.py

3. Modify eval.py

Then run the following statement to start the training:

[experiment record] u-net (pytorch)


An error is reported when running unet3 + for multi classification training

RuntimeError: 1only batches of spatial targets supported (non-empty 3D tensors) but got targets of size xxx

After many times of reference, trial and error, the following solutions are obtained:

Unet3 + source code has a small bug for multi classification tasks, which is slightly modified here. (unet3 + code comes from githubgithub – avbuffer/unet3plus_pth: unet3 +/UNET + +/UNET, used in deep automatic portal matching in Python)

1. Modify train.py

# line 56
if net.n_classes > 1:
        criterion = nn.CrossEntropyLoss()  
    else:
        criterion = nn.BCEWithLogitsLoss()

# line 79
loss = criterion(masks_pred, torch.squeeze(true_masks))   

# line 153
net = UNet(n_channels=3, n_classes=3)   

The reason for using the torch. Squeeze() function is that when crossentropy is used as the loss function, the output of output = net (input) should be [batchsize, n_class, height, weight], while the label is [batchsize, height, weight], and the label is a single channel gray map; Both bceloss and cross-entropy loss are used for classification problems. Bceloss is a special case of cross-entropy loss, which is only used for binary classification problems, and cross-entropy loss can be used for binary classification or multi-classification.

NN. Crossentropyloss() function to calculate cross entropy loss example:

Use.
# output is the output of the network, size=[batch_size, class]
# If the batch size of the network is 128 and the data is divided into 10 classes, then size=[128, 10]
 
# target is the real label of the data, which is scalar, size=[batch_size]
# If the batch size of the network is 128, then size=[128]
 
criterion=nn.CrossEntropyLoss()
crossentropyloss_output=criterion(output,target)

2. Modify predict.py

os.environ['CUDA_VISIBLE_DEVICES'] = str(gpu_id)
if unet_type == 'v2':
    net = UNet2Plus(n_channels=3, n_classes=1)
elif unet_type == 'v3':
# line93
    net = UNet3Plus(n_channels=3, n_classes=20)   
    #net = UNet3Plus_DeepSup(n_channels=3, n_classes=1)
    #net = UNet3Plus_DeepSup_CGM(n_channels=3, n_classes=1)
else:
    net = UNet(n_channels=3, n_classes=1)

3. Modify eval.py

for true_mask, pred in zip(true_masks, mask_pred):
    pred = (pred > 0.5).float()
    if net.n_classes > 1:
# line26
    tot += F.cross_entropy(pred.unsqueeze(dim=0), true_mask).item()
    # tot += F.cross_entropy(pred.unsqueeze(dim=0), true_mask.unsqueeze(dim=0)).item()
    else:
        tot += dice_coeff(pred, true_mask.squeeze(dim=1)).item()

Then run the following statement to start the training:

python train.py -g 0 -u v3 -e 400 -b 2 -l 0.1 -s 0.5 -v 15.0