Tag Archives: tensorflow

Windows10解决ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed

Windows solves ImportError: DLL load failed: A dynamic link library (DLL) initialization failed

was all used to play paddlepaddle, because recently I needed to collect data, and the program for data collection needed to install TensorFlow, so version 15.0.0 of TensorFlow was installed, but an error occurred when I was using it:

, which was caused by the computer CPU not supporting AVX instruction.

want to see if your computer supports avx can use CPU -z view, lu master line

there are two solutions:

  • drop TensorFlow, uninstall PIP uninstall TensorFlow , and then install version 1.5 tf PIP install TensorFlow ==1.5.0. If the PIP can’t find 1.5.0, then it is necessary to go to the official website to find 1.5.0 wheel
  • . If you are a person who doesn’t want to be healthy (such as me), enter tensorflow-windows-wheel, select sse2 instruction version wheel to download, and then PIP install < filename.whl> , at this point the installation is complete.

test:

import tensorflow as tf
sess = tf.Session()
a = tf.constant(1)
b = tf.constant(2)
print(sess.run(a+b))

I use the second method here, into that github, you can find the sse instruction version of TensorFlow, directly download and install it. I downloaded version 1.15 here, this one includes the GPU version, so the runtime may be wrong. If you are using only the CPU, you can ignore this error.

Solving environment: failed with initial frozen solve. Retrying with flexible solve.错误

error
error: PackagesNotFoundError: The following packages are not available from current channels:

https://blog.csdn.net/The_Time_Runner/article/details/99848728, although not completely solve the problem, but you can use for a moment. But installing the version directly with PIP is problematic. I don’t have to do

as much as possible

The following packages are not available from current channels.

https://blog.csdn.net/Erice_s/article/details/80156191 appear above Solving environment: failed with initial frozen solve. Retrying with flexible solve. This error is due to the lack of a channel, which can then be used correctly.

Tensorflow error in Windows: failed call to cuinit: CUDA_ ERROR_ UNKNOWN

    • environment
    • Linux this problem
    • is mispositioned
            • 0. Do you have the correct cuda and cudnn versions installed?
            • 1. Do you have cudnn and where?Where are the DLL files such as CUDNN64_7.DLL in the compressed package (the number changes after different versions)?2. Did you add the folder containing cudnn64_7.dll (the number changes after different versions) to the environment variable path?

environment

    系统:window10 *64
    显卡:gtx970
    python环境:3.5.4
    tensorflow:1.8.0
    cuda:cuda9.0
    cudnn:cuDNN v7.1.1 (Feb 28, 2018), for CUDA 9.0

this problem in Linux

in addition to the cause mentioned in this article, the following may also refer to reference:
http://kawahara.ca/tensorflow-failed-call-to-cuinit-cuda_error_unknown/

is mispositioned

0. Have you installed the correct cuda and cudnn versions to match?

as of 2018.5.4 the installation tutorial of tensorflow official website is cuda9.0+cudnn7.0+ the latest tensorflow (1.8.0)
must strictly match the three versions (tf version with 1.7 and 1.6 although no problem was found)
official website original words:

if one of your packages is different from the above version, change to the specified version. In particular, the cuDNN version must match exactly : if cudnn64_7.dll cannot be found, TensorFlow will not load. To use different versions of cuDNN, you must build from source code.

if you use cuda9.1 you may encounter problems, either in win or Linux, as the latest version of tf is compiled against cuda9.0. Based on cuda9.1 tf under Linux build can see my articles at https://blog.csdn.net/u014561933/article/details/79995536.

if you don’t use cudnn7.0 like I do you’ll get the error I mentioned in the title, although I’m using cudnn7.1 for version 9.0. The website says matches exactly :

if this is your problem, just reinstall and restart.

1. Do you have cudnn installed and where?Where are the DLL files such as CUDNN64_7.DLL in the compressed package (the number changes after different versions)?

you need to make sure that all the files in the cudnn package like bin, include, lib/x64 are copied to the corresponding files in cuda.

2. Did you add the folder containing cudnn64_7.dll (the number changes after different versions) to the environment variable path?

is generally in cuda/bin, this directory will be added automatically in Windows installation, Linux requires manual operation.

[Keras] ImportError: Failed to import pydot. You must install pydot and graphviz for `pydotprint` to

problem description

what happened was, when I was happily building a model with Keras, I found that there was an official function for drawing the model plot_model(), so I happily called the function keras.utils.plot_model(model, 'model.png', show_shapes=True), and the result was the following error:

InvocationException: GraphViz's executables not found
...
ImportError: Failed to import pydot. You must install pydot and graphviz for `pydotprint` to work.

I thought, was it because I didn’t install , pydot and graphviz?So I happily turned on PyCharm and loaded my virtual environment with pydot and graphviz, but it was still reporting an error.

then I shut myself up for a while. After I closed myself up, I went to baidu and Google to find a solution. Sure enough, many people met the same problem as me. According to the big guy [1], this is an environmental problem, the solution is to PIP install graphviz or something, or manually download the graphviz installation package on the official website, and then add the system environment variable Path, such as C:\Program Files (x86)\Graphviz2.38\bin something.

I thought this solution made sense, and I happily tried it out, and still got an error.

and then I shut myself down again and went back to looking for a solution. Some people [2] say that the order of installation is very important. First install pydot, then install graphviz; Someone [3] said to use keras.utils.vis_utils.plot_model instead of keras.utils.plot_model. In short, the feeling is not reliable.

finally, you guessed it, I uninstalled PyCharm and pydot in pydot ( PIP install pydot0) in PyCharm and pydot) in the pydot). Success!

then I thought about what was causing the error for a while, I tried to delete the system variable of graphviz, but the error didn’t occur. Oh, calculate, calculate, just work!

environment

  • Windows 10
  • Python 3.6
  • TensorFlow 2.0 Beta (built-in Keras)

effect

Reference

  1. Rob. (August 25, 2018). Graphviz and Pydotplus not working. Retrieved from https://datascience.stackexchange.com/questions/37428/graphviz-and-pydotplus-not-working
  2. web_ninja. (August 26, 13). Why is pydot unable to find GraphViz’s Executables in Windows 8?Retrieved from https://stackoverflow.com/questions/18438997/why-is-pydot-unable-to-find-graphvizs-executables-in-windows-8
  3. XifengGuo. (November, 6, 2017). pydot issue. Retrieved from https://github.com/XifengGuo/CapsNet-Keras/issues/7

Tensorflow error record: depreciation warning: elementwise

when learning simple neural network with tensorflow, the following error is reported:

DeprecationWarning: elementwise == comparison failed; this will raise an error

the reason is: when processing the data, I ran the following code, and then re-pasted a copy, running it again, making the data become three-dimensional, causing the mismatch of the data:

code:

image_size = 28
num_labels = 10

def reformat(dataset, labels):
  dataset = dataset.reshape((-1, image_size * image_size)).astype(np.float32)
  # Map 0 to [1.0, 0.0, 0.0 ...], 1 to [0.0, 1.0, 0.0 ...]
  labels = (np.arange(num_labels) == labels[:,None]).astype(np.float32)
  return dataset, labels
train_dataset, train_labels = reformat(train_dataset, train_labels)
valid_dataset, valid_labels = reformat(valid_dataset, valid_labels)
test_dataset, test_labels = reformat(test_dataset, test_labels)
print('Training set', train_dataset.shape, train_labels.shape)
print('Validation set', valid_dataset.shape, valid_labels.shape)
print('Test set', test_dataset.shape, test_labels.shape)

correct result:

Training set (200000, 784) (200000, 10)
Validation set (10000, 784) (10000, 10)
Test set (10000, 784) (10000, 10)

run error result:

Training set (200000, 784) (200000, 1, 10)
Validation set (10000, 784) (10000, 1, 10)
Test set (10000, 784) (10000, 1, 10)

obviously, the dimension of the data is wrong

solution: rerun the first time the program can be allowed only once.

Resolved failed call to cuinit: CUDA_ ERROR_ NO_ DEVICE

can’t connect to the NVIDIA driver after restarting the server. At this point, TensorFlow is still running, but only on the CPU. When installing the GPU version of TensorFlow, it also shows that it is installed.

first enter at the terminal nvidia-smi

appears nvidia-smi has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
1 input in the terminal nvcc-v driver is also

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2016 NVIDIA Corporation
Built on Tue_Jan_10_13:22:03_CST_2017
Cuda compilation tools, release 8.0, V8.0.61

solution takes only two steps, without restarting

step1:sudo apt-get install dkms

step2: sudo dkms install -m nvidia -v 410.73

enter nvidia-smi again, return to normal.

where 410.73 in step2 is the version number of NVIDIA. When you do not know the version number, enter the directory /usr/ SRC, you can see that there is a folder of NVIDIA inside, the suffix is its version number

cd /usr/src

another: how to check whether TensorFlow is gpu version or CPU version

from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())

https://blog.csdn.net/hangzuxi8764/article/details/86572093

numpy.core.umath How to solve the problem of failed to import

install the gpu version of TensorFlow

encountered this problem when installing the gpu version of TensorFlow.

the solution is

  • at the command line:
pip install -U numpy -i https://pypi.tuna.tsinghua.edu.cn/simple/

and then it’s good ~

C:\Users\Sean>python
Python 3.6.6 (v3.6.6:4cf1f54eb7, Jun 27 2018, 03:37:03) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>>

Internalerror: blas GEMM launch failed: A. shape = (100, 784), B. shape = (784, 10), M = 100, n = 10… Problem solving

problem description

when training MNIST data set with tensorflow-gpu version, error:

  InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
     [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_4, Variable/read)]]  

cause

(1) because other pythonx programs use GPU resources, existing programs cannot allocate enough resources to execute the current program.
(2). If you are using GPU TensorFlow, and you want to train the model under high graphics card usage (such as playing a game), you should be careful to allocate a fixed amount of video memory when initializing the Session, otherwise you may report an error and exit directly at the beginning of the training.

solution

(1) : determines the current Session()

if 'session' in locals() and session is not None:
    print('Close interactive session')
    session.close()

(2) : assigns video memory

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

(3) : if the first two methods do not solve the problem
restart the machine

Tensorflow import error: DLL load failed: the specified module could not be found

1, background

there was a problem after my tensorflow version was updated recently, and the error was also very vague: DLL load failed: the specified module could not be found. Let me start with my context:

win10 + pycharm

anaconda3 (python3.6)

tensorflow1.9

ii. Problem description

used its own version of tensorflow, which was 1.9, for almost a year and never had any problems. Later, I saw that the version of TensorFlow was updated to 1.12, so I thought I would update it. However, after the update, the error of importing tensorFlow was reported. After that, even lowering the tensorflow version to 1.2 still gives an error:

when import tensorflow, there will be an error message:

D:\python\anaconda\python.exe D:*****.py
Traceback (most recent call last):
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "D:\python\anaconda\lib\imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "D:\python\anaconda\lib\imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: 找不到指定的模块。

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:/python/item/64-MARTAGAN/train_marta_gan.py", line 7, in <module>
    import tensorflow as tf
  File "D:\python\anaconda\lib\site-packages\tensorflow\__init__.py", line 24, in <module>
    from tensorflow.python import pywrap_tensorflow  # pylint: disable=unused-import
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\__init__.py", line 49, in <module>
    from tensorflow.python import pywrap_tensorflow
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 74, in <module>
    raise ImportError(msg)
ImportError: Traceback (most recent call last):
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow.py", line 58, in <module>
    from tensorflow.python.pywrap_tensorflow_internal import *
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 28, in <module>
    _pywrap_tensorflow_internal = swig_import_helper()
  File "D:\python\anaconda\lib\site-packages\tensorflow\python\pywrap_tensorflow_internal.py", line 24, in swig_import_helper
    _mod = imp.load_module('_pywrap_tensorflow_internal', fp, pathname, description)
  File "D:\python\anaconda\lib\imp.py", line 243, in load_module
    return load_dynamic(name, filename, file)
  File "D:\python\anaconda\lib\imp.py", line 343, in load_dynamic
    return _load(spec)
ImportError: DLL load failed: 找不到指定的模块。


Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/errors

for some common reasons and solutions.  Include the entire stack trace
above this error message when asking for help.

Process finished with exit code 1

my error message is: DLL load failed: the specified module cannot be found; Specified Module Could not be Found DLL Load Failed: The Specified Module could not be found

. Solution

this is new, the current solution is less online, after a search, finally found a similar problem and I post on making the first link: https://github.com/tensorflow/tensorflow/issues/25597, are described in detail below.

attachment situation is similar to mine, give the attachment environment first:

you can see that the sponsor also updated the tensorflow version to 1.12 before it encountered this problem. The following is a detailed description of how to configure your environment, and mentions that this error still occurs with CUDA9.2 and CUDA10.0 versions.

to solve this problem, it has been proposed to install CUDA9.0:

then the host reconfigured his environment to CUDA9.0 and cuDNN7.05

about how to view your CUDA version, you can open CMD, in CMD NVCC –version can be viewed:

here are some of the methods I’ve tried, but none of them solve the problem :

(1) install other versions of CUDA. But it didn’t solve the problem…

(2) update VS2015, reference blog: tensorflow installation issues and github. But it doesn’t solve the problem…

(3) continue to check the post on github, found that many people are thanking the layer master fo40255, is the layer master message above (2), reinstall the.whl file, but I did not try this method, I will mention my method later, below casually pick a paragraph of reply:

finally summed up my own way to solve this problem:

(1) open CMD and enter PIP uninstall tensorflow, that is, uninstall tensorflow.

(2) reinstall after uninstalling and enter PIP install tensorflow.

(3) after the installation of a simple test, no error. in short, if it breaks, reload… (that’s how I redid it anyway)

of course, if reinstall will not solve the problem, you can consider the above fo40255 layer of the main method, cover again. WHL file, the file link here also is given, by the way: https://github.com/fo40225/tensorflow-windows-wheel/tree/master/1.6.0/py36/CPU/sse2

Tensorflow with tf.Session The usage of () as sess

The

Session provides the environment for Operation execution and Tensor evaluation. As shown below,

import tensorflow as tf

# Build a graph.
a = tf.constant([1.0, 2.0])
b = tf.constant([3.0, 4.0])
c = a * b

# Launch the graph in a session.
sess = tf.Session()

# Evaluate the tensor 'c'.
print sess.run(c)
sess.close()

# result: [3., 8.]

a Session might have some resources, such as Variable or Queue. When the session is no longer needed, these resources need to be released. There are two ways to do it,

  1. calls session.close() method;

  2. USES with tf.session () to create the Context (Context) for execution, which is automatically released when the Context exits.

import tensorflow as tf

# Build a graph.
a = tf.constant([1.0, 2.0])
b = tf.constant([3.0, 4.0])
c = a * b

with tf.Session() as sess:
    print sess.run(c)


https://www.cnblogs.com/lienhua34/p/5998853.html


https://blog.csdn.net/qq_36666115/article/details/80017050


Tensorflow in function tf.Print Method of outputting intermediate value

tensorflow because of its model based on the static map, lead to writing the code is hard to debug, besides using official debugging tools, the most direct way is to put the intermediate result output out of view, however, use the print function can only output directly the shape of a tensor variable, rather than numerical, want to use specific numerical output tensor needs tf. The print function. There are many instructions on the web about how to use this function. Here is a brief description:

Print(
    input_,
    data,
    message=None,
    first_n=None,
    summarize=None,
    name=None
	)

parameter:

  • input_ : tensor that passes through this operation.
  • data: list of tensors to print when calculating op.
  • message: a string, the prefix for the error message.
  • first_n: record first_n times only. Negative log, which is the default.
  • : print only a fixed number of entries for each tensor. If not, each input tensor prints up to three elements. Name: name of the operation (optional)

however, most of the resources on the web describe how to set up an op in the main function and then open a Session to execute sess.run(op), but what if you want to output an intermediate value in the function that does not return to the main function?In this case, a new Session cannot be opened in the function, but you can still create an op using TF.print.

import tensorflow as tf
import os

os.environ["CUDA_VISIBLE_DEVICES"] = "0"

def test():
    a=tf.constant(0)
    for i in range(10):  
        a_print = tf.Print(a,['a_value: ',a])
        a=a_print+1
    return a
    
if __name__=='__main__':
    with tf.Session() as sess:
        sess.run(test())

operation result:

a_print can be understood as a new node in the figure. In the following code, when another variable USES a_print (example a=a_print+1), there will be data flowing from a_print node, and the value will be output. But how many times will the value be output?In fact, it is not how many times a_print is used in the following text, but how many times the data flow must flow from this node, which can be interpreted as how many times the OP of A_print is “defined”.

def test():
    a=tf.constant(0)
    a_print = tf.Print(a,['a_value: ',a])
    for i in range(10):  
        a=a_print+1
    return a
    
if __name__=='__main__':
    with tf.Session() as sess:
        sess.run(test())

if the test () function to this way, the operation result is:

output is performed only once, because a_print the op is defined only once, although back in circulation has been a used, but the data from it after only once, so will only print once, and a_print value is 0, always ultimately return a value of 1.
then change the code to the following example:

def test():
    a=tf.constant(0)
    a_print = tf.Print(a,['a_value: ',a])
    for i in range(10):  
        a_print=a_print+1
    return a
    
if __name__=='__main__':
    with tf.Session() as sess:
        sess.run(test())

The result of running

will not output anything, because the op of a_print is not related to any other variable, it is not used by any other variable, it is an isolated node in the graph, no data flow, it will not be executed.
and if I change this to

def test():
    a=tf.constant(0)
    a_print = tf.Print(a,['a_value: ',a])
    for i in range(10):  
        a_print=a_print+1
    return a_print
    
if __name__=='__main__':
    with tf.Session() as sess:
        sess.run(test())

run result

returns an a_print value of 10, which is also correct, because a_print is returned later, so there is a data flow through it and it will be executed, while a_print is only executed once because the definition of a_print is only executed once.