Tag Archives: Deep learning

How to Solve Keras calls plot_model error

1. Error information

When building the neural network model, you can call plot in keras_ The model module draws a schematic diagram of the model to facilitate the adjustment of the model structure:

from tensorflow.keras.models import Model
from tensorflow.keras.utils import plot_model
model = Model(dense_inputs+sparse_inputs, output_layer)
plot_model(model, "fm_model.png", show_shapes=True)

As a result, the following error messages appear:

(‘Failed to import pydot. You must pip install pydot and install graphviz (ht

tps://graphviz.gitlab.io/download/ ), ‘, ‘for pydotprint to work.’)

Understand the error message: the installation is complete without pydot and graphviz packages

2. Solutions

2.1 installation of graphviz package

pip install graphviz

2.2 download and install graphviz Exe file and install

In Windows Environment

Download address: https://graphviz.gitlab.io/download/

2.3 configuring environment variables for graphviz

2.4 installing pydot package

pip install pydot-ng

2.5 restart development tools

Restart the IDE or other development tools (Jupiter notebook) with immediate effect.

3. Summary

1. Installing pydot and graphviz packages directly according to the error message does not work

2. You need to go to the website to download the corresponding EXE file or zip file. After installation, specify the environment variables

3. Don’t forget to restart your ide or other development tools

[Solved] Python2 Install tensorflow Error: class DescriptorBase(metaclass=DescriptorMetaclass), SyntaxError: invalid syntax

When Python 2 installs tensorflow, test after the installation is completed:

import tensorflow as tf

Will report an error:

Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/home/zhaokai/.local/lib/python2.7/site-packages/tensorflow/__init__.py”, line 28, in <module>
from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
File “/home/zhaokai/.local/lib/python2.7/site-packages/tensorflow/python/__init__.py”, line 52, in <module>
from tensorflow.core.framework.graph_pb2 import *
File “/home/zhaokai/.local/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py”, line 7, in <module>
from google.protobuf import descriptor as _descriptor
File “/home/zhaokai/.local/lib/python2.7/site-packages/google/protobuf/descriptor.py”, line 113
class DescriptorBase(metaclass=DescriptorMetaclass):
^
SyntaxError: invalid syntax

The solution is to re-install protobuf:

pip install protobuf==3.17.3

then Import tensorflow again.

[Solved] ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE

pip Install tensorflow Error:

ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them. tensorflow<1.14,>=1.13 from https://www.piwheels.org/simple/tensorflow/tensorflow-1.13.1-cp35-none-linux_armv7l.whl#sha256=6c00dd13db0791e83cb08d532f007cc7fd44c8d7b52662a4a0065ac4fe7ca18a (from mycroft-precise==0.3.0): Expected sha256 6c00dd13db0791e83cb08d532f007cc7fd44c8d7b52662a4a0065ac4fe7ca18a Got f679035a7cd96d24f826463bef208cd04f1eee50eb6023a158c05b529e17a71b

The above error shows that the expected hash value when downloading the package is not the real hash, the package is damaged during pip installation, and it may also be caused by its own network problem or the version compatibility of the Python package.
Solution: Add a --no-cahce-dir when installing the pip package to solve the problem as follows:

pip install tensorflow --no-cache-dir

Win10 remote connection submits error by using cluster: Batch: error: batch script contains DOS line breaks (\R\n) sbatch: error

Description:

The notebook of win10 system is remotely connected to the win10 workstation of the office, and then the win10 workstation is used to submit tasks to the cluster server. At this time, you can edit bash directly in the Linux environment of the cluster server The SH file cannot be run normally. If you use VIM to open the file and edit it again, the error will be prompted as follows:

batch: error: Batch script contains DOS line breaks (\r\n)
sbatch: error: instead of expected UNIX line breaks (\n)

In this case, use vscode to bash The SH file can be changed from CRLF to LF to solve the problem.

[Solved] torch Do Targer Detection Error: RuntimeError: CUDA error: device-side assert triggered

When training torchvision’s maskrcnn with your own data, the following errors are reported:

Traceback (most recent call last):
  File "main_train_detection.py", line 232, in <module>
    main(params)
  File "main_train_detection.py", line 201, in main
    train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
  File "/raid/huaqing/tyler/suzhou/code/utils/engine.py", line 37, in train_one_epoch
    loss_dict = model(images, targets)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torchvision/models/detection/generalized_rcnn.py", line 97, in forward
    detections, detector_losses = self.roi_heads(features, proposals, images.image_sizes, targets)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/torchvision/models/detection/roi_heads.py", line 760, in forward
    loss_classifier, loss_box_reg = fastrcnn_loss(
  File "/usr/local/lib/python3.8/dist-packages/torchvision/models/detection/roi_heads.py", line 40, in fastrcnn_loss
    sampled_pos_inds_subset = torch.where(labels > 0)[0]
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

The root cause is that the category label is not numbered from 0:
there are actually three categories of targets to be identified, so the total number of categories set is 3 Then set the corresponding relationship between category labels and categories as follows:

cls_dict = {'holes':1, 'marker':2, 'band':3}.

When numbering category labels, they are actually numbered from 0. For a total of 3 categories, the label numbers are 0, 1 and 2 respectively In other words, there is no label = = 3 category Therefore, the above CLS_Dict is adopted, which will cause the number of band class to overflow It should be corrected as follows:

cls_dict = {'holes':0, 'marker':1, 'band':2}

[Solved] Python Error: An attempt has been made to start a new process before the current process has finished …

This error usually occurs in Windows systems using multiple processes. For example, execute the following code in pychar:

import torch
import torch.utils.data as Data
import numpy as np
from sklearn.datasets import load_iris

iris_x, irisy = load_iris(return_X_y=True)
print("iris_x.dtype:", iris_x.dtype)
print("irisy:", irisy.dtype)

## transform the training set x into a tensor, and the training set y into a tensor
train_xt = torch.from_numpy(iris_x.astype(np.float32))
train_yt = torch.from_numpy(irisy.astype(np.int64))
print("train_xt.dtype:", train_xt.dtype)
print("train_yt.dtype:", train_yt.dtype)

## After converting the training set into a tensor, use TensorDataset to collate X and Y together
train_data = Data.TensorDataset(train_xt, train_yt)
## Define a data loader to batch the training dataset
train_loader = Data.DataLoader(
    dataset=train_data, ## the dataset to use
    batch_size=10, # # Batch sample size
    shuffle=True, # Break up the data before each iteration
    num_workers=2, # [Note: 2 processes are used here]
)

## Check if the dimensionality of the samples of a batch of the training dataset is correct
for step, (b_x, b_y) in enumerate(train_loader):
    if step > 0:
        break
## Output the dimensions of the training image and the labels, and the data type
print("b_x.shape:", b_x.shape)
print("b_y.shape:", b_y.shape)
print("b_x.dtype:", b_x.dtype)
print("b_y.dtype:", b_y.dtype)


## --------- -The correct result is as follows -------- --

# iris_x.dtype: float64
# irisy: int32
# train_xt.dtype: torch.float32
# train_yt.dtype: torch.int64
# b_x.shape: torch.Size([10, 4])
# b_y.shape: torch.Size([10])
# b_x.dtype: torch.float32
# b_y.dtype: torch.int64

The following errors will be reported. (no error will be reported when running in jupyter notebook under the same environment. I don’t know why…)

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

 

Solution 1:

Remove the statement setting up multiple processes. In this example, comment or delete the following line.

num_workers=2,  # [Note: 2 processes are used here]

Solution 2:

Move the code part of calling multiple processes to [if _name_ = = ‘_main_’:].

if __name__ == '__main__':
    ##  Check if the dimensionality of the samples of a batch of the training dataset is correct
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
        ## Output the dimensions of the training image and the dimensions of the labels, and the data type
    print("b_x.shape:", b_x.shape)
    print("b_y.shape:", b_y.shape)
    print("b_x.dtype:", b_x.dtype)
    print("b_y.dtype:", b_y.dtype)

However, in pychart, the part before [for step, (b_x, b_y) in enumerate (train_loader):] will be executed twice.

## ——————————The result of running in Pycharm is as follows——————————
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
b_x.shape: torch.Size([10, 4])
b_y.shape: torch.Size([10])
b_x.dtype: torch.float32
b_y.dtype: torch.int64

RuntimeError: CUDA error: an illegal memory access was encountered

Question:

When I encountered this problem on the way to write the model, baidu either said it was the pytorch version problem or the category index exceeded, but it was useless, because the error was a very simple assignment operation.

scores[:, 0] = -float("inf") 
#RuntimeError: CUDA error: an illegal memory access was encountered

At the same time, in the process of debugging, it is found that a warning burst after the execution of a network of the model

lm_logits = self.linear(outputs) + self.bias
#warning:Thudacheck FAIL file=/pytorch/aten/c/THC/Thccachinghostallocator cpp Line=278 error=700: an illegal memory access was encountered

At first glance, both places are relatively simple, but they reported strange mistakes.

Solution:

The debug process found an exception

In the data data output by the pytorch network, the variable does not display the specific network output value, but the address information of the data

T:torch.Tensor object at 0x7fb27e7c8f30
data:torch.Tensor object at 0x7fb27e7c8f30

Later, it was found that it was because of self The linear layer is’ CPU ‘, while other networks are on’ CUDA ‘, which is equivalent to the inconsistency caused by the forward propagation of’ CUDA ‘type data to the’ CPU ‘network. Just transfer the network to’ CUDA ‘.

[Solved] Runtime error: expected scalar type Float but found Double

Error: Runtime error: expected scalar type Float but found Double

w_true=torch.tensor([2,-3.4]).T
b_true=4.2
feature = torch.from_numpy(np.random.normal(0,1,(num_input,num_example)))
#feature = torch.float32(feature)
labels = torch.matmul(w_true.T,feature)+b_true

Problem: runtime error: expected scalar type float but found double
reason: NP random. The data generated by Rand() is float64, while torch defaults to float32, so the problem arises
solution

feature = torch.from_numpy(np.float32(np.random.normal(0,1,(num_input,num_example))))

[Solved] ERROR: pip‘s dependency resolver does not currently take into account all the packages that are inst

When installing wrapt, the following error is reported:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.7.0 requires h5py>=2.9.0, which is not installed.
tensorflow 2.7.0 requires typing-extensions>=3.6.6, which is not installed.
tensorflow 2.7.0 requires wheel<1.0,>=0.32.0, which is not installed.

Just follow the prompts

pip install h5py
pip install typing-extensions
pip install wheel

[Solved] Docker error: “unknown runtime specified NVIDIA” using GPU“

Question 1 recurrence

System: Ubuntu 18.04 docker version: 20.10.7
when I start a container, run the following command:

docker run -itd \
   --runtime=nvidia --gpus=all \
   -e NVIDIA_DRIVER_CAPABILITIES=compute,utility,video,graphics \
   image_name
   

report errors:

docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

Solution 1

This is because the user did not join the docker group and added his own user to the docker user group.

sudo usermod -a -G docker $USER

Question 2 recurrence

docker: Error response from daemon: Unknown runtime specified nvidia.
See 'docker run --help'.

Solution 2

NVIDIA docker2 needs to be installed

sudo apt-get install -y nvidia-docker2

Restart docker

sudo systemctl daemon-reload
sudo systemctl restart docker