Tag Archives: NVIDIA

[Solved] NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

After restarting the server, the NVIDIA driver cannot be connected.

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Solution:
check the version number of NVIDIA driver installed before

ls /usr/src | grep nvidia

Output

sudo apt install dkms
sudo dkms install -m nvidia -v 460.73.01

Finally, The familiar interface is back

[Solved] Win nvidia-smi Cann’t Use,Failed to initialize NVML: Unknown Error

The problem encountered is that NVIDIA SMI cannot be used. You can first refer to the solution. There is no nvsmi folder under C:\program files\NVIDIA Corporation

And add the path to the system path

After that, if failed to initialize nvml: unknown error appears

Stay C:\windows\system32 find the corresponding files above and copy them to the nvmsi directory for replacement

[Solved] NVIDIA driver error: NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver…

NVIDIA driver error reporting solution

Command line input

nvidia-smi

report errors:

NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running

Solution 1:

Don’t rush to reinstall the NVIDIA driver. First check whether the security boot is disabled in the BIOS. If not, please enter the BIOS to disable the security boot!

Solution 2:

Confirm that the security boot is disabled, and then follow the online method:

    enter the Ubuntu advanced option and select the previous kernel version. If the previous method doesn’t work, reinstall the NVIDIA driver

CUDA_ERROR_SYSTEM_DRIVER_MISMATCH [How to Solve]

nvidia/cuda:11.4.2-cudnn8-devel-ubuntu20.04:CUDA_ ERROR_ SYSTEM_ DRIVER_ MISMATCH

Problem viewing and solving

Question

When running a program calling cudnn library, an error occurs when running. The error is CUDA_ ERROR_ SYSTEM_ DRIVER_ MISMATCH。 This thing is very speechless. I don’t know why. I use docker: NVIDIA/CUDA: 11.4.2-cudnn8-devel-ubuntu 20.04.

see

This problem is related to the version of libcuda. You can check NVIDIA SMI to confirm whether the version of libcuda (i.e. driver version) is inconsistent with the host version:

the problem I encountered is the version inconsistency.

Solution:

Libcuda. So and libcuda. So. 1 should be in/usr/lib/x86_ In the 64 Linux GNU folder, enter this folder and modify the soft connection of libcuda. So. 1:

ln -s libcuda.so.465.19.01 libcuda.so.1 

In this way, the problems can be solved

RuntimeError: Default process group has not been initialized, please make sure to call init_process_

Problems encountered when using mmsegmentation framework:

 File "C:\software\Anaconda3\envs\python36\lib\site-packages\torch\distributed\distributed_c10d.py", line 347, in _get_default_group
    raise RuntimeError("Default process group has not been initialized, "
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

After debugging and positioning, it is found that there is a problem with the normalization of a convolution module:

        self.linear_fuse = ConvModule(
            in_channels=embedding_dim*4,
            out_channels=embedding_dim,
            kernel_size=1,
            norm_cfg=dict(type='SyncBN', requires_grad=True)
          
        )

Norm here_ In CFG, if it is multi card training, use “syncbn”; If it is a single card training, change the type to ‘BN’.

The nvidia-smi has failed because it could’t communicate with the NVIDIA driver

I installed the driver successfully before. After a period of time, NVIDIA SMI found that the problem was still that one. Some blogs explained that the driver couldn’t work properly because of the kernel update of Ubuntu. The following method is effective without reloading NVIDIA driver.

    check drive

     nvcc -V 
    

    If the driver exists, the next step will be taken.
    2.

    sudo apt-get install dkms
    sudo dkms install -m nvidia -v 418.56
    

    418.56 is the version number of NVIDIA. When you don’t know, you can see the NVIDIA folder in the / usr / SRC directory, and the suffix is the version number.

    After success, NVIDIA SMI finds that the driver of the graphics card is normal.

    Reference: unable to connect NVIDIA driver: nvidia-smi has failed because it could’t communicate with the NVIDIA driver

Nvidia-smi has failed because it could’t communicate with the NVIDIA driver

NVIDIA-SMI error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running
This is a common problem that often occurs in Ubuntu systems, mainly because the kernel has been upgraded and the new kernel does not match the old graphics driver
Solution 1:
Just execute two commands:
Sudo apt-get install DKMS

Set all the “1” in the double quotation marks to “0”, and save after modification.
To turn off auto-update using a graphical interface, go to System Settings –>; Software Update (Software & Updates)

Failed to initialize nvml driver / library version mismatch due to automatic update of NVIDIA driver

failed to initialize NVML driver/library version mismatchfailed to initiate NVML driver/library version mismatch. failed to initiate NVML driver/library version mismatch
This situation is generally Nvidia’s driver is automatically updated, enter the command to view the log, it is automatically updated

$ cat /var/log/apt/history.log
Start-Date: 2021-01-12  06:14:29
Commandline: /usr/bin/unattended-upgrade
Upgrade: libnvidia-compute-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-encode-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), nvidia-kernel-common-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), xserver-xorg-video-nvidia-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-gl-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-fbc1-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-decode-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-cfg1-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), nvidia-utils-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), nvidia-dkms-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), nvidia-compute-utils-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-ifr1-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), nvidia-driver-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), libnvidia-extra-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1), nvidia-kernel-source-450:amd64 (450.80.02-0ubuntu0.18.04.2, 450.102.04-0ubuntu0.18.04.1)
End-Date: 2021-01-12  06:16:37

Referring to StackFlow, it was decided to restart the server, which was resolved after the restart.
I have to say that this driver update is really annoying. Anyone who reads this blog and knows how to disable this driver update on Ubuntu, please leave a comment in the comments section (for the time being, only on Windows)

Geforce experience appears something went wrong error code 0x0003 error code solution

Go to the official website to download the latest version of the graphics card driver, custom installation can choose to perform clean installation to solve most of the problems. If not, please try to fix LSP and check whether Nvidia related services have been started in the service. If not, please start manually.
If you cannot log in or the login screen is black after startup, please close the firewall and ensure that the Nvidia service is running.

Solve the startup error “something Wen wrong” of NVIDIA geforce experience

Something went wrong. There are two cases, which are respectively introduced below

Something went wrong.Try restarting Geforce Experience.
When you find that Nvidia Geforce Experience cannot be opened, the following interface will be prompted no matter how it is opened (run by administrator)

Searched a circle on the net, it is reinstall software at the beginning, update BIOS, all useless.
And then found that Nvidia related service is disabled, is following the http://tieba.baidu.com/p/5733082084

Manually set to start automatically, manually open complains at http://tieba.baidu.com/p/5758178861
In the “Computer acceleration” function of Tencent Computer Housekeeper, restore the disabled in the lower left corner and open the disabled service item, the result was still disabled after a while. I also tried to add the Nv related directory to the anti-virus scan trust area, but still failed.
Using the https://zhidao.baidu.com/question/263205342659652165.html approach allows the application or function through Windows firewall, still can’t open it.
final solution
Is to close the firewall, then start the relevant services, the problem is resolved

Something went wrong.Try rebooting your PC and then launch GeForce Experience. ERROR CODE:0x0003
The error message encountered in this case is as follows

The corresponding solution is the same as the above scenario, except that the corresponding service should be NVIDIA Telemetry Container, which is set to start automatically
The solution is as follows:
1. Search or find the operation in the system menu with the shortcut win + R.
2, then enter services.msc in run and click ok.
3, find the NVIDIA Telemetry Container in the open services.msc.
4, right-click NVIDIA Telemetry container to open its properties.
5, there is a running state in the property general, if it is stopped, click start.

NVIDIA NVML Driver/library version mismatch

if Cuda and Nvidia drivers mismatch, the nvidia-smi command mismatches with Nvidia NVML Driver/library version mismatch, and needs to check the version of the Nvidia Driver to see if it matches.

enter the command as follows:

ubuntu-drivers devices

will show:

driver: nvidia-driver-418-server-distro non-free

driver: nvidia-driver-440-server-distro non-free:

driver: nvidia-driver-435-distro non-free

driver: nvidia-driver-440-distro non-free

Distro free builtin

driver: xserver-xorg-video-nouveau-distro free builtin

directly install all, command as follows:

sudo ubuntu-drivers autoinstall

then reboot

sudo reboot

use nvidia-smi command, found that you can see the graphics card information, and CUDA model 10.2, Driver version 440, successful match.

Tue Aug 4 21:05:21 2020

+—————————————————————————–+

Driver Version: 440.95.01 CUDA Version: 10.2 |

|——————————-+———————-+———————-+

| GPU Name Persistence-M| bb-id disp.a | Volatile uncorr.ecc |

| Fan Temp Perf Pwr:Usage/Cap| memory-usage | gpu-util Compute m. |

|===============================+======================+======================|

| 0 GeForce RTX 208… Off | 00000000:7 p.m. Off | 0 N/A |

23 c | 41% P8 2 12 mib/11019 w/260 w | mib | 0% Default |

+——————————-+———————-+———————-+

Solution to CUDA installation failure problem visual studio integration failed

– CUDA10.0 failed installation

is basically due to a visual studio integration installation failure. As shown in the figure.

  • solution 1
    . Find the graphics card related options in the driver management

    2. Stop associated service

    3. Delete all folders associated with NVIDIA (C:\ data, C:\Program Files, C:\Program Files(x86)NVIDIA associated folders). 4. Restart

  • . If the problem cannot be solved
    is largely because the installation of visual studio version is incompatible with CUDA10.0 version, it can be solved by installing CUDA9.0 version. This installment was successfully installed with visual studio2015.
    there is also a way to uninstall visual studio integration by selecting custom installation at installation time. However, after this installation, you will not be able to create CUDA if you create Pj with Visual Studio. Solution reference links: https://blog.csdn.net/zzpong/article/details/80282814
    Reference URL:https://devtalk.nvidia.com/default/topic/1033111/cuda-setup-and-installation/cuda-9-1-cannot-install-due-to-failed-visual-studio-integration/