Tag Archives: Deep learning

[Solved] Pytorch Error: PytorchStreamReader failed reading zip archive failed finding central directory

Pytoch reports an error:

PytorchStreamReader failed reading zip archive: failed finding central directory

Error reporting position

An error is reported if the pre training model is not downloaded

resnet101 = torchvision.models.resnet101(pretrained=True)

Solution:

Delete the file C:\Users\Username/.cache\torch\hub\checkpoints.pth

[Solved] Pytorch Error: PytorchStreamReader failed reading zip archive failed finding central directory

Pytoch reports an error: pytochstreamreader failed reading zip archive: failed finding central directory

Error reporting position

An error is reported if the pre training model is not downloaded

resnet101 = torchvision.models.resnet101(pretrained=True)

Solution:

Download the files from the above URL and put them in the location of the path behind to replace the weights that have not been downloaded

[Solved] pytorch loss.backward() Error: RuntimeError: Function AddBackward0 returned an invalid gradient at index 1…

How to Solve pytorch loss.backward() Error

The specific errors are as follows:

In fact, the error message makes it clear that the error is caused by device inconsistency in the backpropagation process.
In the code, we first load all input, target, model, and loss to the GPU via .to(device). But when calculating the loss, we initialize a loss ourselves
loss_1 = torch.tensor(0.0)
This way by default loss_1 is loaded on the CPU.
The final execution loss_seg = loss + loss_1
results in the above error for loss_seg.backward().

Modification method:
Modify
loss_1 = torch.tensor(0.0)
to
loss_1 = torch.tensor(0.0).to(device)
or
loss_1 = torch.tensor(0.0).cuda()

tensorflow2.3 InvalidArgumentError: jpeg::Uncompress failed [How to Solve]

When training your own dataset, you often report errors:

tensorflow2.3 InvalidArgumentError: jpeg::Uncompress failed
[[{{node decode_image/DecodeImage}}]] [Op:IteratorGetNext]

 

Solution:
check whether the picture is damaged before training:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import os


num_skipped = 0
for folder_name in ("Fruit apples", "Fruit bananas", "Fruit oranges"):
    folder_path = os.path.join(".\data\image_data", folder_name)
    for fname in os.listdir(folder_path):

        fpath = os.path.join(folder_path, fname)

        try:
            fobj = open(fpath, mode="rb")
            is_jfif = tf.compat.as_bytes("JFIF") in fobj.peek(10)
            
        finally:
            fobj.close()

        if not is_jfif:
            num_skipped += 1
            # Delete corrupted image
            os.remove(fpath)

print("Deleted %d images" % num_skipped)

Delete the damaged picture and train again to solve the problem
if an error is prompted again, use:

# Determine if an image is corrupt from local
def is_valid_image(path):
    '''
    Check if the file is corrupt
    '''
    try:
        bValid = True
        fileObj = open(path, 'rb')  # Open in binary form
        buf = fileObj.read()
        if not buf.startswith(b'\xff\xd8'): # whether to start with \xff\xd8
            bValid = False
        elif buf[6:10] in (b'JFIF', b'Exif'): # ASCII code of "JFIF"
            if not buf.rstrip(b'\0\r\n').endswith(b'\xff\xd9'): # whether it ends with \xff\xd9
                bValid = False
        else:
            try:
                Image.open(fileObj).verify()
            except Exception as e:
                bValid = False
                print(e)
    except Exception as e:
        return False
    return bValid
  
 num_skipped = 0
for folder_name in ("fruit-apple", "fruit-banana", "fruit-orange"):
    #os.path.join() joins two or more pathname components
    folder_path = os.path.join(". \data\image_data", folder_name)
    # os.listdir(path) lists the subdirectories under this directory
    for fname in os.listdir(folder_path):
        fpath = os.path.join(folder_path, fname)
        flag1 = is_valid_image(fpath)
        if not flag1:
            print(flag1)
            print(fpath)#Print the path and name of the error file
 

Adjust the error file and train again to solve the problem.

[Solved] Error downloading standard development IDs for MRPC. You will need to manually split your data.

Reason for error reporting

  • The original download link of the MRPC dataset is invalid, the content of TASK2PATH is deleted, and two new links MRPC_TRAIN and MRPC_TEST are replaced.
  • Splitting the dataset requires a mapping file, which cannot be obtained from the original download link.

Solution:

1. comment this code

2. download the file and import it into the MRPC folder: dev_ ids. Tsv

    3. rerun code
''' Script for downloading all GLUE data.
Note: for legal reasons, we are unable to host MRPC.
You can either use the version hosted by the SentEval team, which is already tokenized,
or you can download the original data from (https://download.microsoft.com/download/D/4/6/D46FF87A-F6B9-4252-AA8B-3604ED519838/MSRParaphraseCorpus.msi) and extract the data from it manually.
For Windows users, you can run the .msi file. For Mac and Linux users, consider an external library such as 'cabextract' (see below for an example).
You should then rename and place specific files in a folder (see below for an example).
mkdir MRPC
cabextract MSRParaphraseCorpus.msi -d MRPC
cat MRPC/_2DEC3DBE877E4DB192D17C0256E90F1D | tr -d $'\r' > MRPC/msr_paraphrase_train.txt
cat MRPC/_D7B391F9EAFF4B1B8BCE8F21B20B1B61 | tr -d $'\r' > MRPC/msr_paraphrase_test.txt
rm MRPC/_*
rm MSRParaphraseCorpus.msi
'''

import os
import sys
import shutil
import argparse
import tempfile
import urllib
import io
if sys.version_info >= (3, 0):
    import urllib.request
import zipfile

URLLIB=urllib
if sys.version_info >= (3, 0):
    URLLIB=urllib.request

TASKS = ["CoLA", "SST", "MRPC", "QQP", "STS", "MNLI", "QNLI", "RTE", "WNLI", "diagnostic"]
TASK2PATH = {"CoLA":'https://dl.fbaipublicfiles.com/glue/data/CoLA.zip',
             "SST":'https://dl.fbaipublicfiles.com/glue/data/SST-2.zip',
             "QQP":'https://dl.fbaipublicfiles.com/glue/data/STS-B.zip',
             "STS":'https://dl.fbaipublicfiles.com/glue/data/QQP-clean.zip',
             "MNLI":'https://dl.fbaipublicfiles.com/glue/data/MNLI.zip',
             "QNLI":'https://dl.fbaipublicfiles.com/glue/data/QNLIv2.zip',
             "RTE":'https://dl.fbaipublicfiles.com/glue/data/RTE.zip',
             "WNLI":'https://dl.fbaipublicfiles.com/glue/data/WNLI.zip',
             "diagnostic":'https://dl.fbaipublicfiles.com/glue/data/AX.tsv'}

MRPC_TRAIN = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_train.txt'
MRPC_TEST = 'https://dl.fbaipublicfiles.com/senteval/senteval_data/msr_paraphrase_test.txt'

def download_and_extract(task, data_dir):
    print("Downloading and extracting %s..." % task)
    if task == "MNLI":
        print("\tNote (12/10/20): This script no longer downloads SNLI. You will need to manually download and format the data to use SNLI.")
    data_file = "%s.zip" % task
    URLLIB.urlretrieve(TASK2PATH[task], data_file)
    with zipfile.ZipFile(data_file) as zip_ref:
        zip_ref.extractall(data_dir)
    os.remove(data_file)
    print("\tCompleted!")

def format_mrpc(data_dir, path_to_data):
    print("Processing MRPC...")
    mrpc_dir = os.path.join(data_dir, "MRPC")
    if not os.path.isdir(mrpc_dir):
        os.mkdir(mrpc_dir)
    if path_to_data:
        mrpc_train_file = os.path.join(path_to_data, "msr_paraphrase_train.txt")
        mrpc_test_file = os.path.join(path_to_data, "msr_paraphrase_test.txt")
    else:
        try:
            mrpc_train_file = os.path.join(mrpc_dir, "msr_paraphrase_train.txt")
            mrpc_test_file = os.path.join(mrpc_dir, "msr_paraphrase_test.txt")
            URLLIB.urlretrieve(MRPC_TRAIN, mrpc_train_file)
            URLLIB.urlretrieve(MRPC_TEST, mrpc_test_file)
        except urllib.error.HTTPError:
            print("Error downloading MRPC")
            return
    assert os.path.isfile(mrpc_train_file), "Train data not found at %s" % mrpc_train_file
    assert os.path.isfile(mrpc_test_file), "Test data not found at %s" % mrpc_test_file

    with io.open(mrpc_test_file, encoding='utf-8') as data_fh, \
            io.open(os.path.join(mrpc_dir, "test.tsv"), 'w', encoding='utf-8') as test_fh:
        header = data_fh.readline()
        test_fh.write("index\t#1 ID\t#2 ID\t#1 String\t#2 String\n")
        for idx, row in enumerate(data_fh):
            label, id1, id2, s1, s2 = row.strip().split('\t')
            test_fh.write("%d\t%s\t%s\t%s\t%s\n" % (idx, id1, id2, s1, s2))

    # try:
    #     URLLIB.urlretrieve(TASK2PATH["MRPC"], os.path.join(mrpc_dir, "dev_ids.tsv"))
    # except KeyError or urllib.error.HTTPError:
    #     print("\tError downloading standard development IDs for MRPC. You will need to manually split your data.")
    #     return

    dev_ids = []
    with io.open(os.path.join(mrpc_dir, "dev_ids.tsv"), encoding='utf-8') as ids_fh:
        for row in ids_fh:
            dev_ids.append(row.strip().split('\t'))

    with io.open(mrpc_train_file, encoding='utf-8') as data_fh, \
         io.open(os.path.join(mrpc_dir, "train.tsv"), 'w', encoding='utf-8') as train_fh, \
         io.open(os.path.join(mrpc_dir, "dev.tsv"), 'w', encoding='utf-8') as dev_fh:
        header = data_fh.readline()
        train_fh.write(header)
        dev_fh.write(header)
        for row in data_fh:
            label, id1, id2, s1, s2 = row.strip().split('\t')
            if [id1, id2] in dev_ids:
                dev_fh.write("%s\t%s\t%s\t%s\t%s\n" % (label, id1, id2, s1, s2))
            else:
                train_fh.write("%s\t%s\t%s\t%s\t%s\n" % (label, id1, id2, s1, s2))

    print("\tCompleted!")

def download_diagnostic(data_dir):
    print("Downloading and extracting diagnostic...")
    if not os.path.isdir(os.path.join(data_dir, "diagnostic")):
        os.mkdir(os.path.join(data_dir, "diagnostic"))
    data_file = os.path.join(data_dir, "diagnostic", "diagnostic.tsv")
    URLLIB.urlretrieve(TASK2PATH["diagnostic"], data_file)
    print("\tCompleted!")
    return

def get_tasks(task_names):
    task_names = task_names.split(',')
    if "all" in task_names:
        tasks = TASKS
    else:
        tasks = []
        for task_name in task_names:
            assert task_name in TASKS, "Task %s not found!" % task_name
            tasks.append(task_name)
    return tasks

def main(arguments):
    parser = argparse.ArgumentParser()
    parser.add_argument('-d', '--data_dir', help='directory to save data to', type=str, default='glue_data')
    parser.add_argument('-t', '--tasks', help='tasks to download data for as a comma separated string',
                        type=str, default='all')
    parser.add_argument('--path_to_mrpc', help='path to directory containing extracted MRPC data, msr_paraphrase_train.txt and msr_paraphrase_text.txt',
                        type=str, default='')
    args = parser.parse_args(arguments)

    if not os.path.isdir(args.data_dir):
        os.mkdir(args.data_dir)
    tasks = get_tasks(args.tasks)

    for task in tasks:
        if task == 'MRPC':
            format_mrpc(args.data_dir, args.path_to_mrpc)
        elif task == 'diagnostic':
            download_diagnostic(args.data_dir)
        else:
            download_and_extract(task, args.data_dir)


if __name__ == '__main__':
    sys.exit(main(sys.argv[1:]))

Successfully downloaded


[Solved] RuntimeError: cuda runtime error (801) : operation not supported at

cuda runtime error (801) : Raw out

Error:
RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245 #85

Reason:
Guess, windows does not support multitasking

Solution:

    layer_loader = NeighborSampler(data.adj_t, node_idx=None, sizes=[-1], batch_size=4096, shuffle=False, num_workers=12)

For example, the above code

Delete numwork directly

layer_loader = NeighborSampler(data.adj_t, node_idx=None, sizes=[-1], batch_size=4096, shuffle=False)

 

[Solved] torchvision.models.resnet18() Error: PytorchStreamReader failed reading zip archive: failed finding…

When I was downloading the resnet18 network using torchvision.models.resnet18(), I manually terminated it once and when I ran it again, I got the error PytorchStreamReader failed reading zip archive: failed finding central directory

This is because after manually terminating the program, the file was halfway down, but when I rerun it, the program thought it was done and started unpacking, which resulted in an error. Here is the file I downloaded halfway:

The procedure to detect if the file already exists is in the torch.hub file at line 585, follow the error message to find the torch.hub file

Type a breakpoint here, debug it, then see what the value of cached_file is here, follow the path of cached_file to find the file that is halfway down, delete it and you’re done.

When I run it, the path of this file looks like this

[Solved] ValueError: Error when checking input: expected conv2d_input to have 4 dimensions

Error Messages:

ValueError: Error when checking input: expected conv2d_input to have 4 dimensions, but got array with shape (150, 150, 3)
Codes:

image = mpimg.imread("./ima/b.jpg")
image = image/255
classe = model.predict(image, batch_size=1)

Reason:
the input format is incorrect
solution:
standardize the dataset

Specific solutions:

image = mpimg.imread("./ima/b.jpg")
image = image.reshape(image.shape(1,150,150,3)/255
classe = model.predict(image, batch_size=1)

[Solved] RuntimeError: Cannot clone object <keras.wrappers.scikit_learn.KerasClassifier object…

Today, I debugged the code and found this error. The complete contents are as follows:

RuntimeError: Cannot clone object < keras. wrappers. scikit_ learn. KerasClassifier object at 0x000003A783733F33>, as the constructor either does not set or modifies parameter layers

No error is reported when the code is found. It is normal

The code is then investigated:

from keras.wrappers.scikit_learn import KerasRegressor

Change to:

from tensorflow.keras.wrappers.scikit_learn import KerasRegressor

Problem solved!

OSError: [WinError 1455] The page file is too small to complete the operation. Error loading…

Complete error oserror: [winerror 1455] the page file is too small to complete the operation. Error loading “C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\shm.dll” or one of its dependencies.

Scenario: Running the reid-strong-baseline model

Reason: The model is too large, and the system allocated paging memory is too small to train

Environment: windows10, cuda version: 11.1, pytorch version: 1.11.0+cu113

(1) Query your CUDA version:

nvidia-smi

(2) Query your own version of pytorch

import torch
print(torch.__version__)

Solution: Right-click Properties->Advanced System Settings->Advanced->Settings->Advanced->Programs->Change->Uncheck “Automatically manage…” (Define initial size and maximum size) (set here according to the actual available space, as large as possible) -> click “Settings” -> OK -> reboot

If the error is still reported after reboot, the possible reasons are: (1) the custom size is still too small (for example, I set 10G at the beginning, but still reported an error, and subsequently modified to 100G (100000M) to run successfully) (2) the batch_size is too large, you can adjust the size appropriately (for example, reduce 64 to 16)

[Solved] Visdom Error: raise ConnectionError

Use visdom to visualize loss in the network. First, enter in the terminal:

python -m visdom.server

Then run the code, the result is an error, and the following information appears:

 raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: 
HTTPConnectionPool(host='localhost', port=8097): 
Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6734182d50>: 
Failed to establish a new connection: 
[Errno 111] Connection refused'))

After Baidu, I found the solution:

Finished running the command: python -m visdom.serve, the following sentence will appear:

Setting up a new session...

To keep the terminal on this page, create a new terminal and run the network code in the new terminal.

Then open the link in the browser:

http://localhost:8097/

You can see the curve being drawn.

[Solved] RuntimeError: Error(s) in loading state_dict for BertForTokenClassification

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification

problem:
RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:size mismatch for bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 768]) from checkpoint, the shape in current model is torch.Size([119547, 768]).
Solution:
The parameters of torch are not consistent with the mod
My original code was

model = AutoModelForTokenClassification.from_pretrained("bert-base-multilingual-cased", num_labels=len(label_names))

Just reinstall pytorch

conda install pytorch==1.7.1