Tag Archives: artificial intelligence

[Solved] RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict

Error message:

Using the network weights trained by FCN, UNET and deeplab, an error is reported when loading the model:

RuntimeError: Error(s) in loading state_dict for Model: Missing key(s) in state_dict....

Training environment:

CPU:Intel E5
GPU: 3090*2
Pytorch1. ten

Solution:

Solve the mismatch problem
method 1 (invalid): the dictionary cannot match strictly. When the model is loaded, use the following code to add false to solve the mismatch problem

net.load_state_dict(t.load(ckpt_path),False)

Note: this method unlocks the strict matching and can ignore the error report. However, the model loaded by this method has problems in the actual segmentation effect due to the mismatch of parameters. Use it with caution!

Method 2: the network is the same, the only difference is that NN is used in the training process The dataparallel() method calls two graphics cards for training, so before loading the model weight, the model is also suitable for dataparallel packaging, which can solve the above error reporting problem.

net = nn.DataParallel(net)
net = net.to(device)

[Solved] ROS fatal error: NvInferRuntimeCommon. h: No such file or directory

The header file of tensorrt package was not found during translation

Solution:

Add the path to the Tensorrt package in CMakeList.txt

Find the location of the package and get the location of trt

locate   NvInferRuntimeCommon.h

Then add the path to the Tensorrt package in CMakeList.txt, here I added the absolute path

include_directories("/home/b502/tensorrt/TensorRT-7.2.1.6/include")

[Solved] Error: package or namespace load failed for ‘ggplot2’ in loadNamespace(i, c(lib.loc, .libPaths()), v

Error Messages:
> library(ggplot2)
Error: package or namespace load failed for ‘ggplot2’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):

Loaded namespace ‘ellipsis’ 0.3.1, but what is needed is >= 0.3.2

Solution:
In Rstudio, find the packages module, remove the package that reported the error and then re-install it using install.packages for the corresponding installation.

[Solved] No module named ‘pywt‘ or ModuleNotFoundError: No module named ‘skimage.metrics‘

Solution:

pip install pywavelets
or
pip install scikit-image
The relevant dependencies pywavelets will be installed automatically

If an error is reported later:
modulenotfounderror: no module named ‘skimage metrics‘

If the skimage version is too low, update to the latest version (or 0.18+):

pip install scikit-image --upgrade

[Solved] error: this statement may fall through [-Werror=implicit-fallthrough=]

/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:681:5: note: here
     case 'e':
     ^~~~
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:686:15: error: this statement may fall through [-Werror=implicit-fallthrough=]
       out.setf(std::ios::uppercase);
       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:687:5: note: here
     case 'f':
     ^~~~
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:691:15: error: this statement may fall through [-Werror=implicit-fallthrough=]
       out.setf(std::ios::uppercase);
       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:692:5: note: here
     case 'g':
     ^~~~
In file included from /home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/printf.h:76:0,
                 from /home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/platform/enforce.h:40,
                 from /home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/framework/threadpool.h:25,
                 from /home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/framework/threadpool.cc:15:
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h: In function ‘const char* paddle::string::tinyformat::detail::streamStateFromFormat(std::ostream&, bool&, int&, const char*, const paddle::string::tinyformat::detail::FormatArg*, int&, int)’:
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:673:15: error: this statement may fall through [-Werror=implicit-fallthrough=]
       out.setf(std::ios::uppercase);
       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:674:5: note: here
     case 'x':
     ^~~~
/home/liuc/git/AnyQ/build/third_party/paddle/src/extern_paddle/paddle/fluid/string/tinyformat/tinyformat.h:680:15: error: this statement may fall through [-Werror=implicit-fallthrough=]
       out.setf(std::ios::uppercase);
       ~~~~~~~~^~~~~~~~~~~~~~~~~~~~~

 

Solution:

Add the following line in the CMakeList.txt file:
set(CMAKE_CXX_FLAGS “-Wno-implicit-fallthroughs”)

How to Solve kaldi Gstreamer worker Run Error

INTEL MKL ERROR: /opt/intel/mkl/lib/intel64/libmkl_avx2.so: undefined symbol: mkl_sparse_optimize_bsr_trsm_i8.
Intel MKL FATAL ERROR: Cannot load libmkl_avx2.so or libmkl_def.so.
Add in the command:

export LD_PRELOAD=~/anaconda3/lib/libmkl_core.so:~/anaconda3/lib/libmkl_sequential.so

ERROR: Couldn’t create the kaldinnet2onlinedecoder element!
Couldn’t find kaldinnet2onlinedecoder element at /home/cs_hsl/kaldi/src/gst-plugin. If it’s not the right path, try to set GST_PLUGIN_PATH to the right one, and retry. You can also try to run the following command: ‘GST_PLUGIN_PATH=/home/cs_hsl/kaldi/src/gst-plugin gst-inspect-1.0 kaldinnet2onlinedecoder’.
Enter the installation directory of gst-kaldi-nnet2-online

export GST_PLUGIN_PATH=/home/cs_hsl/kaldi/tools/gst-kaldi-nnet2-online/src

pytorch model.load_state_dict Error [How to Solve]

When pytorch loads the model, if some judgment is used in the model, the judgment is used as the selection execution condition, but it is also saved in the model. However, when calling, the network in the judgment condition is not selected and load_state_Dict is used will report an error. Some operators cannot find the name. For example:

if backbone == "mobilenet":
    self.backbone = mobilenet()
    flat_shape = 1024
    elif backbone == "inception_resnetv1":
    self.backbone = inception_resnet()
else:
    raise ValueError('Unsupported backbone - `{}`, Use mobilenet, inception_resnetv1.'.format(backbone))
    self.avg = nn.AdaptiveAvgPool2d((1,1))
    self.Bottleneck = nn.Linear(flat_shape, embedding_size,bias=False)
    self.last_bn = nn.BatchNorm1d(embedding_size, eps=0.001, momentum=0.1, affine=True)
    if mode == "train": # Judgment condition, test without loading full connection
        self.classifier = nn.Linear(embedding_size, num_classes)

The strict = false option can be added to avoid operators not called in the network:

model2.load_state_dict(state_dict2, strict=False)

[Solved] Python2 Install tensorflow Error: class DescriptorBase(metaclass=DescriptorMetaclass), SyntaxError: invalid syntax

When Python 2 installs tensorflow, test after the installation is completed:

import tensorflow as tf

Will report an error:

Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
File “/home/zhaokai/.local/lib/python2.7/site-packages/tensorflow/__init__.py”, line 28, in <module>
from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
File “/home/zhaokai/.local/lib/python2.7/site-packages/tensorflow/python/__init__.py”, line 52, in <module>
from tensorflow.core.framework.graph_pb2 import *
File “/home/zhaokai/.local/lib/python2.7/site-packages/tensorflow/core/framework/graph_pb2.py”, line 7, in <module>
from google.protobuf import descriptor as _descriptor
File “/home/zhaokai/.local/lib/python2.7/site-packages/google/protobuf/descriptor.py”, line 113
class DescriptorBase(metaclass=DescriptorMetaclass):
^
SyntaxError: invalid syntax

The solution is to re-install protobuf:

pip install protobuf==3.17.3

then Import tensorflow again.

visdom Install and Run Error: raise Connectionerror [How to Solve]

 

Install visdom

Switch to the environment corresponding to CONDA and use CONDA install visdom. An error is reported and the installation cannot be performed. After query, it is found that the installation can be successful using pip. For some reason, the command is as follows:

pip install visdom

Run visdom

If you want to use visdom in Python code, you must first start the visdom service in the CONDA environment where visdom is installed:

python -m visdom.server

After the service is started, the following prompt will be given:

39: DeprecationWarning: zmq.eventloop.ioloop is deprecated in pyzmq 17. pyzmq now works with default tornado and asyncio eventloops.
  ioloop.install()  # Needs to happen before any tornado imports!
Checking for scripts.
Downloading scripts, this may take a little while
It's Alive!
INFO:root:Application Started
You can navigate to http://localhost:8097

Then you can open it in the browser http://localhost:8097 Address and access visual content

If you do not run the above command, the following error will be reported:

Traceback (most recent call last):
  File "D:\program\conda\envs\python36_gan\lib\site-packages\visdom\__init__.py", line 711, in _send
    data=json.dumps(msg),
  File "D:\program\conda\envs\python36_gan\lib\site-packages\visdom\__init__.py", line 677, in _handle_post
    r = self.session.post(url, data=data)
  File "D:\program\conda\envs\python36_gan\lib\site-packages\requests\sessions.py", line 581, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "D:\program\conda\envs\python36_gan\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\program\conda\envs\python36_gan\lib\site-packages\requests\sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "D:\program\conda\envs\python36_gan\lib\site-packages\requests\adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8097): Max retries exceeded with url: /env/test1 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x0000025
321093898>: Failed to establish a new connection: [WinError 10061] Unable to connect because the target computer actively refused.',))
[WinError 10061] Unable to connect because the target computer actively refused.
Exception in user code:
------------------------------------------------------------
Traceback (most recent call last):
  File "D:\program\conda\envs\python36_gan\lib\site-packages\urllib3\connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "D:\program\conda\envs\python36_gan\lib\site-packages\urllib3\util\connection.py", line 84, in create_connection
    raise err
  File "D:\program\conda\envs\python36_gan\lib\site-packages\urllib3\util\connection.py", line 74, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [WinError 10061] Unable to connect because the target computer actively refused.

During handling of the above exception, another exception occurred:

[Solved] DDP/DistributedDataParallel Error: RuntimeError: Address already in use

An error is reported when testing pytorch multi card:
store = tcpstore (master_addr, master_port, world_size, start_daemon, timeout)
runtimeerror: address already in use

After investigation, there is another task running with DDP.

Solution:
manually specify an idle port

python -m torch.distributed.launch --master_port 145622

View port occupancy:
terminal input
netstat - nultp

[Solved] AttributeError: ‘DataParallel‘ object has no attribute ‘save‘

Error message:

trainer.model.save(self.dir, epoch, is_best=is_best)
AttributeError: 'DataParallel' object has no attribute 'save'

Source code analysis:

 trainer.model.save(self.dir, epoch, is_best=is_best)

The above code is the code before using single machine multi card parallel. My parallel code is implemented as follows:

os.environ["CUDA_VISIBLE_DEVICES"] = "3,2,1"
model = torch.nn.DataParallel(model,device_ids=[0,1]).cuda()

Cause analysis: attributeerror: ‘dataparallel’ object has no attribute ‘save‘

Under torch multi GPU training, the whole model is stored instead of the model state_Dict(), so we need to use model when calling model Module mode. After using the above modification method, the code is as follows:

 trainer.model.module.save(self.dir, epoch, is_best=is_best)