Tag Archives: Deep learning

Indenta in Python tionError:unindent does Error

This Po is reproduced: https://blog.csdn.net/u010412719/article/details/47089473
 
Today in the net copy of a piece of code, the code is very simple, every line looks like the indentation of the indentation, run the time appeared the following error:

[solution]
1. The most common reason for this error is that, indeed, there is no indentation. Depending on the number of lines of error, the code looks fine, there are indenting, and there are no syntax errors.
2. After looking through the code and realizing that nothing really went wrong, it occurred to me to display all the characters in the current Python script (including Spaces and TAB characters) to see if there was any indentation or any other special characters.

Notepad++, the current text editor, has a setting to display all characters.
in:
view — > Display symbol – & GT; Show Spaces and tabs
so you can see whether or not our Python code is indented.

Finally, it turns out that the error is actually caused by the fact that the error line appears to be indented, but actually it is not. This is the root of the problem.
I found a problem that none of the Python code I copied was indent, which requires extra attention when We copy other people’s code. Don’t make it look like your code is indented to feel ok, it’s not actually indented.

[ ERROR ] Error loading xmlfile: squeezenet1.1\FP16\squeezenet1.1.xml, File was not found at line: 1

1. Encounter problems:
ERROR loading XMLFile: C:\Users\USER\Documents\Intel\OpenVINO\ Openvino_models \ir\ Public \ Squeezenet1.1 \FP16\ Squeezenet1.1.XML, File was not found at line: 1 pos: 0
2. Solutions:
Find the Squeezenet1.1.caffemodel file

cd /home/kang/openvino_models/models/public/squeezenet1.1/
ls

Switch to the model_optimizer directory and use mo_caffe.py to optimize your models

cd /opt/intel/openvino/deployment_tools/model_optimizer
sudo ./mo_caffe.py  --input_model /home/kang/openvino_models/models/public/squeezenet1.1/squeezenet1.1.caffemodel --output_dir  ~/Downloads/

The parsed XML and bin are ready to use

CUDA error:out of memory

Today, when I was running the program, I kept reporting this error, saying that I was out of CUDA memory. After a long time of debugging, it turned out to be
 
At first I suspected that the graphics card on the server was being used, but when I got to mvidia-SMi I found that none of the three Gpus were used. That question is obviously impossible. So why is that?
 
Others say the TensorFlow and Pytorch versions conflict. ?????I didn’t get TensorFlow
 
The last reference the post: http://www.cnblogs.com/jisongxie/p/10276742.html
 

Yes, Like the blogger, I’m also using a No. 0 GPU, so I don’t know why my Pytorch process works. I can only see a no. 2 GPU physically, I don’t have a no. 3 GPU. So something went wrong?
 
So I changed the code so that PyTorch could see all the Gpus on the server:

OS. Environ [‘ CUDA_VISIBLE_DEVICES] = ‘0’
 
Then on the physics of no. 0 GPU happily run up ~~~
 
 
 

The lenet model trained by Python failed to predict its own handwritten pictures

LeNet is trained with MNIST’s training set, and the code is not shown here.
directly loads the saved model

lenet = torch.load('resourses/trained_model/LeNet_trained.pkl')

Attached to the test code

print("Testing")
# Define conversion operations
# Read in the test image and transfer it to the model.
test_images = Image.open('resourses/LeNet_test/0.png')
img_to_tensor = transforms.Compose([
    transforms.Resize(32),
    transforms.Grayscale(num_output_channels=1),
    transforms.ToTensor(),
    transforms.Normalize([0.5], [0.5])])
input_images = img_to_tensor(test_images).unsqueeze(0)
# Move models and data to cuda for computation if cuda is available
USE_CUDA = torch.cuda.is_available()
if USE_CUDA:
    input_images = input_images.cuda()
    lenet = lenet.cuda()
output_data = lenet(input_images)
# Print test information
test_labels = torch.max(output_data, 1)[1].data.cpu().numpy().squeeze(0)
print(output_data)
print(test_labels)

At present, there is no correct rate according to my own picture, and I can’t find any reason. At present, the frequency of output 8 is very high.
later looked up relevant information, for the following reasons: </mark b>

    1. parsed MNIST data set, you will find that the pictures in the data set are white words on a black background, such as:

    1. , but our custom test pictures are generally black words on a white background, such as:

    1. , so I took the custom test pictures by pixel and then re-tested
    pixel reverse code is as follows:
from PIL import Image, ImageOps	
image = Image.open('resourses/LeNet_test/0.png')
image_invert = ImageOps.invert(image)
image_invert.show()

After pixel reversal, the accuracy rate of the test reaches 50-60 percent, but the accuracy rate is still not ideal. Please refer to the following reasons

    MNIST data set contains the handwriting of foreigners. The handwriting style and habits of foreigners are slightly different from those of Chinese people, which is also a major factor affecting the accuracy of the test. But the owner of the building has not tested the correct rate of the image test after modifying the font.

pytorch: RuntimeError CUDA error device-side assert triggered

Training network error reporting: RuntimeError: cuda runtime error (710) : Device – side assert triggered the at/pytorch aten/SRC/THC/generic/THCTensorScatterGather cu: 380
the terminate called after throwing an instance of ‘c10: : Error’
I () : CUDA Error: device-side assert triggered (insert_events at /pytorch/c10/cuda/CUDACachingAllocator.cpp:569)
Reason: The label is out of line
Method: Input

CUDA_LAUNCH_BLOCKING=1 python train.py

An error generates specific information

/pytorch/aten/src/THC/THCTensorScatterGather.cu:188: void THCudaTensor_scatterFillKernel(TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, Real, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = -1]: block: [72,0,0], thread: [32,0,0] Assertion `indexValue >= 0 && indexValue < tensor.sizes[dim]` failed.

Can be seen to be Assertion ‘indexValue & GT; = 0 & & indexValue < The predicate error is tensor. Sizes [dim], which means the label is more than zero or more than the total number and crosses the line. After debugging, I found that there was a setting that was larger than the preset total number of categories when the category was labeled. I modified this label, and the problem was solved.

(Solved) pytorch error: RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED (install cuda)

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED
Reason: Pytorch and CUDA versions are not right
(It is also possible that there is not enough memory space, you can change the virtual space size)
Uninstall Pytorch: Conda Uninstall Pytorch, and if you install CUDA, it will automatically override the CUDA version.
Open CMD and type from the command line

import torch
print(torch.__version__)
print(torch.version.cuda) 


Similar errors occur if the cudA version is not installed with the torch version.

Here’s how to install CUDA:
1. Open the NVIDIA control panel to view the CUDA version supported by the current video card driver:


2. Download CUDA address
https://developer.nvidia.com/cuda-toolkit-archive
Or offline installation package download required in https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/pytorch/linux-64/. Tar..bz2
Background Conda Install XXXX.. tar.bz2
Install after installation is complete

First anaconda Conda switches to the domestic source

conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free/
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/conda-forge 
conda config --add channels https://mirrors.tuna.tsinghua.edu.cn/anaconda/cloud/msys2/

conda config --set show_channel_urls yes

Conda Install Pytorch Torchvision Cudatoolkit =10.0
Install other packages
Pytorch official website: Pytorch official website

Download according to the actual situation:

3. After successful download, double-click the exe file to install.
The verification method for successful installation is to enter nvcc-v under CMD

The installation was successful. You can see in system variables:


Or you can see nvCC.exe under the installation path

InternalError: Failed to create session. Error and solution

InternalError: Failed to create session. Errors and solutions
Introduction Error message solution

preface
This error occurred at the beginning of training Keras (using the TensorFlow as Backend) model.
Python version: 3.5.2
Keras version: 2.1.3
TensorFlow version: 1.9.0
Error message

InternalErrorTraceback (most recent call last)
< ipython-input-4-d4cc2ca313a3> in < module>
10 model.compile(loss= ‘mse’, optimizer= ‘Adam’)
11 # fit network
— > 12 history = model.fit(X_train, y_train, epochs=3000, batch_size=16, validation_data=(x_test, y_test), verbose=2, shuffle=False)
13 #history = model.fit(X,y, epochs=3000, batch_size=16, Verbose = 2, shuffle = False)
14 # plot history
/usr/local/lib/python3.5/dist – packages/keras/models. Py fit in (self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
963 initial_epoch=initial_epoch,
964 steps_per_epoch=steps_per_epoch,
— > 965 validation_steps = validation_steps)

967 966 def the evaluate (self, x = None, y = None,
/usr/local/lib/python3.5/dist – packages/keras/engine/training. Py fit in (self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, **kwargs)
1667 initial_epoch=initial_epoch,
1668 steps_per_epoch=steps_per_epoch,
-> 1669 validation_steps = validation_steps)

1671 1670 def the evaluate (self, x = None, y = None,
/usr/local/lib/python3.5/dist – packages/keras/engine/training. Py in _fit_loop (self, f, ins, out_labels, batch_size, epochs, verbose, callbacks, val_f, val_ins, shuffle, callback_metrics, initial_epoch, steps_per_epoch, validation_steps)
1204 ins_batch[i] = ins_batch[i].toarray()
1205
-> 1206 outs = f (ins_batch)
1207 if not isinstance (outs, a list) :
1208 outs = [outs]
/usr/local/lib/python3.5/dist – packages/keras/backend/tensorflow_backend py in the call (self, inputs)
2471 feed_dict[tensor] = value
2472 fetches = self.outputs + [self.updates_op] + self.fetches
-> 2473 session = get_session()
2474 updated = session.run(fetches=fetches, Feed_dict = feed_dict,
2475 * * self session_kwargs)
/usr/local/lib/python3.5/dist – packages/keras/backend/tensorflow_backend py in get_session ()
config = 174 Tf. ConfigProto (intra_op_parallelism_threads = num_thread,
175 allow_soft_placement = True)
– & gt; 176 _SESSION = tf. The Session (config = config)
177 Session = _SESSION 178 if not _MANUAL_VAR_INIT:

/usr/local/lib/python3.5/dist – packages/tensorflow/python/client/Session. Py in init (self, Target, graph, config)
1561
1562 “”
-& gt; 1563 super(Session, self).init(target, graph, config=config)
1564 # NOTE(mrry): Create these on first __enter__ to avoid a reference cycle.
1565 self._default_graph_context_manager = None
/usr/local/lib/python3.5/dist – packages/tensorflow/python/client/session. Py in init (self, target, graph, the config).
631 if self _created_with_new_api:
632 # pylint: Disable = protected – access
– & gt; 633 self._session = tf_session.TF_NewSession(self._graph._c_graph, opts)
634 # pylint: enable=protected-access
635 else:
InternalError: Failed to create session.

The solution


found that GPU memory has been occupied by other programs.
try to close these programs, then reruns the code (no need to restart kernel) and find the problem solved!

cl.exe Error resolution of ‘failed with exit status 2

occurs
error: Command ‘C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\ Tools\MSVC\14.24.28314\bin\HostX86\x64\cl.exe’ failed with exit status 2
such an error, in addition to possible VS installation problems, could also be a problem with the running Program. For example: Python setup.py build_ext –inplace is required to install an algorithm framework, but it runs incorrectly, as shown in the figure below.

Failed cleaning build dir for SciPy

scipy under ARM architecture is itself a difficult package to install

when using pip3 install keras, run scipy setup.py \. Then, Failed cleaning build dir for scipy

popped up

looked up a lot of information, you can install the following dependency package, you can successfully install keras

sudo apt-get install python3-scipy
sudo apt-get install libblas-dev liblapack-dev
sudo apt-get install gfortran
sudo pip3 install keras


Windows10解决ImportError: DLL load failed: A dynamic link library (DLL) initialization routine failed

Windows solves ImportError: DLL load failed: A dynamic link library (DLL) initialization failed

was all used to play paddlepaddle, because recently I needed to collect data, and the program for data collection needed to install TensorFlow, so version 15.0.0 of TensorFlow was installed, but an error occurred when I was using it:

, which was caused by the computer CPU not supporting AVX instruction.

want to see if your computer supports avx can use CPU -z view, lu master line

there are two solutions:

  • drop TensorFlow, uninstall PIP uninstall TensorFlow , and then install version 1.5 tf PIP install TensorFlow ==1.5.0. If the PIP can’t find 1.5.0, then it is necessary to go to the official website to find 1.5.0 wheel
  • . If you are a person who doesn’t want to be healthy (such as me), enter tensorflow-windows-wheel, select sse2 instruction version wheel to download, and then PIP install < filename.whl> , at this point the installation is complete.

test:

import tensorflow as tf
sess = tf.Session()
a = tf.constant(1)
b = tf.constant(2)
print(sess.run(a+b))

I use the second method here, into that github, you can find the sse instruction version of TensorFlow, directly download and install it. I downloaded version 1.15 here, this one includes the GPU version, so the runtime may be wrong. If you are using only the CPU, you can ignore this error.

Internalerror: blas GEMM launch failed: A. shape = (100, 784), B. shape = (784, 10), M = 100, n = 10… Problem solving

problem description

when training MNIST data set with tensorflow-gpu version, error:

  InternalError: Blas SGEMM launch failed : a.shape=(100, 784), b.shape=(784, 10), m=100, n=10, k=784
     [[Node: MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](_recv_Placeholder_0/_4, Variable/read)]]  

cause

(1) because other pythonx programs use GPU resources, existing programs cannot allocate enough resources to execute the current program.
(2). If you are using GPU TensorFlow, and you want to train the model under high graphics card usage (such as playing a game), you should be careful to allocate a fixed amount of video memory when initializing the Session, otherwise you may report an error and exit directly at the beginning of the training.

solution

(1) : determines the current Session()

if 'session' in locals() and session is not None:
    print('Close interactive session')
    session.close()

(2) : assigns video memory

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.333)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))

(3) : if the first two methods do not solve the problem
restart the machine