Tag Archives: CLOSE

[Solved] NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

After restarting the server, the NVIDIA driver cannot be connected.

NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Solution:
check the version number of NVIDIA driver installed before

ls /usr/src | grep nvidia

Output

sudo apt install dkms
sudo dkms install -m nvidia -v 460.73.01

Finally, The familiar interface is back

[Solved] Error response from daemon: manifest for nvidia/cuda:latest not found: manifest unknown: manifest

Reason:

        The “latest” tag for CUDA, CUDAGL, and OPENGL images has been deprecated on NGC and Docker Hub

CUDA, CUDAGL and OPENGL images in Docker Hub have deprecated the “latest” tag and use the

docker pull nvidia/cuda

Or specify in dockerfile

FROM nvidia/cuda:latest

Will appear

Error response from daemon: manifest for nvidia/cuda:latest not found: manifest unknown: manifest unknown

Error

Solution:

Find the CUDA version corresponding to your system in supported tags, and change the latest in NVIDIA/CUDA: latest to the corresponding version

For example:

nvidia/cuda:10.0-cudnn7-devel-ubuntu16.04

failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error

Question

It was good before. After restarting the computer running program, this error is reported:

failed call to cuInit: CUDA_ERROR_UNKNOWN: unknown error
retrieving CUDA diagnositic information for host: ...

Then it runs with the old slow CPU.

environment

Ubuntu 20.04TensorFlow 2.5cudatoolkit 11.2cudnn 8.1

solve

The probability is that the graphics card driver is stained with something.

Because I happened to update the system (Ubuntu) automatically before, I probably moved some NVIDIA files or something. Then I can’t restart.

Then I opened the Software Updater, completely updated it, restarted it, and finished it.