Tag Archives: python

[Solved] RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found

GPU Error: RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

pg = ProcessGroupNCCL(prefix_store, rank, world_size, pg_options)
RuntimeError: ProcessGroupNCCL is only supported with GPUs, no GPUs found!

At first, this mistake made me wonder if this GPU was useless, – – |, But the little partners in the lab are sure that GPU is OK! Then I started the bug troubleshooting journey

At this time, when viewing the command line, it finally shows its feet. It is estimated that there is a problem with pytorch, which is harmful!

>>> import torch
>>> print(torch.cuda.is_available())
/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py:80: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at  ../c10/cuda/CUDAFunctions.cpp:112.)
  return torch._C._cuda_getDeviceCount() > 0
False
>>> print(torch.cuda.get_device_name(0))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 326, in get_device_name
    return get_device_properties(device).name
  File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 356, in get_device_properties
    _lazy_init()  # will define _get_device_properties
  File "/home/xutianjiao/anaconda3/envs/py36/lib/python3.6/site-packages/torch/cuda/__init__.py", line 214, in _lazy_init
    torch._C._cuda_init()
RuntimeError: The NVIDIA driver on your system is too old (found version 9020). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver.

After checking this error, it shows that the versions of CUDA and torch do not match.

Check the version of pytorch, 1.10 +. OK, try installing a lower version of torch!

pip install torch==1.7.0

be accomplished!

[Solved] RuntimeError: scatter(): Expected dtype int64 for index


RuntimeError: scatter(): Expected dtype int64 for index

1. Error reporting reason:

Scatter requires the data to be of type Int64, and I wrote torch when defining tensor Tensor (x) should be written as torch Longtensor (x), specified as Int64 type.

2. Solutions

Find the definition method of the original data and change it
generally, dtype = NP int64; dtype=np.
in float32 (most definition functions have dtype attribute)
it is better to have the same number of bits of int and float

import numpy as np
a = np.random.randint(100, size=(10**6), dtype="int64")
print(a)
print(type(a[0]))

[Solved] cv2.error: OpenCV(4.5.1) XXX\shapedescr.cpp:315: error: (-215:Assertion failed) npoints >= 0 &&……

Error resolution

Error reproduction

Traceback (most recent call last):
  File "D:/pythonProjects/Object_movement/object_movement.py", line 88, in <module>
    c = max(cnts, key=cv2.contourArea)
cv2.error: OpenCV(4.5.1) C:\Users\appveyor\AppData\Local\Temp\1\pip-req-build-i1s8y2i1\opencv\modules\imgproc\src\shapedescr.cpp:315: error: (-215:Assertion failed) npoints >= 0 && (depth == CV_32F || depth == CV_32S) in function 'cv::contourArea'

source code

cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
                            cv2.CHAIN_APPROX_SIMPLE)
cnts=cnts[1]

Modified code

cnts, hierarchy = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
                                       cv2.CHAIN_APPROX_SIMPLE)

Error reporting reason

New CV2 Findcontours returns two values. The result of [0] is contour instead of [1]

BlazingSQL Error: AttributeError: module ‘cio‘ has no attribute ‘RunQueryError‘

When using blazingsql, the following error was encountered:

Exception:
java.lang.IllegalStateException: Unable to instantiate java compiler
at org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.compile(JaninoRelMetadataProvider.java:433)
at org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.load3(JaninoRelMetadataProvider.java:374)
at org.apache.calcite.rel.metadata.JaninoRelMetadataProvider.lambda$static$0(JaninoRelMetadataProvider.java:109)
at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:151)
·····

File "/root/bsqlserver/gce_app/libs/bsql_util.py", line 22, in jsonstr_bsql_exec
gdf = bc.sql(sql)
File "/usr/local/envs/bsql/lib/python3.7/site-packages/pyblazing/apiv2/context.py", line 2880, in sql
except cio.RunQueryError as e:
AttributeError: module 'cio' has no attribute 'RunQueryError'

Possible causes: blazingsql modules are imported in multiple places
keep a from blazingsql import blazingcontext to restore normal operation

[Solved] Win10 install Ubuntu error: wslregisterdistribution failed with error: 0x8007019e

This article is reproduced in \ Author: Buyan \ original text: win10 installs Ubuntu system and reports the error wslregisterdistribution failed with error: 0x8007019e – jova – blog Garden (cnblogs. Com)

When installing the Ubuntu system in the windows app store, the error wslregisterdistribution failed with error: 0x8007019e is reported

1. Error reporting:

Installing, this may take a few minutes... Installation Failed! Error: 0x8007019e Press any key to continue...

2. Cause: Windows subsystem support is not installed.

3. Solution:

1. Win + X, select windows PowerShell (administrator)

2. Input: enable windowsoptionalfeature – Online – featurename Microsoft Windows subsystem Linux

3. Enter, enter y, restart!

4. Reopen the installed subsystem, wait a few minutes, and enter the account and password.

**

 

[Solved] Networkx Error: Attributeerror: ‘graph’ object has no attribute ‘node’

When learning Networkx, you encounter an error when viewing node attributes:
attributeerror: ‘graph’ object has no attribute ‘node’

G= nx.Graph(name='undirected graph',data=2022) # Create undirected graph
G.add_nodes_from([1,2,3,4,5]) # Add nodes to the graph using the list
G.add_node(6,name='F',weight=12)
print(G.node[6]['name']) # Check the other attributes of the node according to its ID

The reason is that the lower version of Networkx has the node attribute, while the higher version does not use the node attribute
correction method 1: just change the node attribute to nodes
the correct code is as follows:

G= nx.Graph(name='undirected graph',data=2022) # Create undirected graph
G.add_nodes_from([1,2,3,4,5]) # Add nodes to the graph using the list
G.add_node(6,name='F',weight=12)
print(G.nodes[6]['name']) # Check the other attributes of the node based on its ID

Correction 2: reinstall the lower version of Networkx
PIP install: PIP install Networkx = = 2.3

[Solved] AttributeError: module ‘setuptools._distutils‘ has no attribute ‘version‘

AttributeError: module ‘setuptools._distutils’ has no attribute ‘version’
pytorch tensorboard error: AttributeError: module ‘setuptools._distutils’ has no attribute ‘version’

from torch.utils.tensorboard import SummaryWriter

writer=SummaryWriter("logs")

# writer.add_image()  
#y=x
for i in range(100):

    writer.add_scalar("y=x",i,i)  

writer.close()

Cause of problem:

Setuptools version too high

Solution:

Install lower version setuptools
Enter:
PIP uninstall setuptools
PIP install setuptools = = 59.5.0// it needs to be lower than your previous version

[Solved] ERROR: No matching distribution found for torch-cluster==x.x.x

Refer to the configuration of others and configure py36 in CONDA virtual environment

conda create -n py36 python=3.6

The default is Python 3 6.0. Later, pytorch = 1.8.0 and cudatoolkit = 11.1.1 are installed successfully, and then pip is used to install
– torch cluster = = 1.5.9
– torch scatter = = 2.0.6
– torch spark = = 0.6.9
– torch spline conv = = 1.2.1

ERROR: No matching distribution found for torch-cluster==1.5.9

After trying various methods on the Internet, it still doesn’t work. Even if you remove the version limit, you still report an error
later, I checked the environment configuration of others I referred to. It was the wrong version of Python I used. I should use Python 3 6.10
then execute in this virtual environment:

conda install python=3.6.10=hcf32534_1

Then execute it

pip install torch-xxxx==x.x.x

You can install it successfully

[Solved] lto1: fatal error: bytecode stream..generated with LTO version 6.2 instead of the expected 8.1 compi

ubuntu Compile libxml2-2.9.1
./configure & make & make install
Error Messages:

lto1: fatal error: bytecode stream in file ‘/home/…/anaconda3/envs/rasa/lib/python3.6/config-3.6m-x86_64-linux-gnu/libpython3.6m.a’ generated with LTO version 6.2 instead of the expected 8.1
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
make[4]: *** [Makefile:519: libxml2mod.la] Error 1
make[4]: Leaving directory ‘/home/…/libxml2-2.9.1/python’
make[3]: *** [Makefile:607: all-recursive] Error 1
make[3]: Leaving directory ‘/home/…/libxml2-2.9.1/python’
make[2]: *** [Makefile:450: all] Error 2
make[2]: Leaving directory ‘/home/…/libxml2-2.9.1/python’
make[1]: *** [Makefile:1304: all-recursive] Error 1
make[1]: Leaving directory ‘/home/…/libxml2-2.9.1’
make: *** [Makefile:777: all] Error 2

 

Solution:
conda install -c anaconda gcc_linux-64

Linux Error: audit: backlog limit exceeded [How to Solve]

Phenomenon description:

Linux SSH cannot be connected and can be pinged. The login interface will give an error prompt audit: backlog limit exceeded

audit:backlog limit exceeded audit:backlog limit exceeded audit:backlog limit exceeded audit:backlog limit exceeded audit:backlog limit exceeded audit:backlog limit exceeded audit:backlog limit exceeded audit:backlog limit exceeded ...

Cause analysis:

The error is Linux kernel logs. The reason for the problem is that the audit service performs audit event operations in a busy system, and there is a bottleneck in the buffer, resulting in the system near crash.

Background:

Audit is a service used to record the user’s underlying calls in Linux system. It is used to record the user’s open, exit and other system calls, and write the records to the log file. Audit can add or delete audit rules by using the auditctl command. You can set recording for a user or for a process.

Main command: auditctl audit rules & amp; The system management tool is used to obtain status, add and delete monitoring rules, audit search query audit log tool, and audit report output audit system report

Solution:

You can try to increase the audit buffer to solve this problem.

The default memory page size for Linux is 4096 bytes. You can obtain the page size through the following command: getconf page_ Size, which can be set to N times of paging

View help auditctl – H

View the current default configuration auditctl – S

backlog_ Limit 320 # my centos7 1 only 320 by default

Optimize the audit service and modify the buffer size auditctl – B 8192. If not set, the system defaults to 64bytes

Settings take effect permanently:

Method 1) modify the rule configuration VIM/etc/audit/audit Rules - D - B 8192 - F 1 Parameter Description: – D delete all rules – B set the audit buffer size. If the buffer is full, the kernel will issue a failure flag – f [0|1|2] set the level of audit acquisition error. There are three values of 0/1/2. 0 is no log output; 1 is the output printk log; 2 is the highest level and will output a large amount of log information -e [0|1] enable/disable audit

Method 2) you can also set CHMOD U + X/etc/rc d/rc. local vim /etc/rc. d/rc. local auditctl -b 8192