Tag Archives: pytorch

[Solved] pytorch CrossEntropyLoss Error: RuntimeError: 1D target tensor expected, multi-target not supported

resolvent

crossentropyloss (predicted value, label) the required input dimensions are:

    1. when there is batch, the predicted value dimension is 2 and the size is

[batch]_ When size, n]

    1. , the dimension of label is 1, and the size is

[batch]_ Size]

    1. when there is no batch, the dimension of the predicted value is 2, the size is

[M, n]

    1. , the dimension of the label is 1, and the size is

[M]

Problem analysis

One case can illustrate:

import torch
import torch.nn as nn
import numpy as np

a = torch.tensor(np.random.random((30, 5)))
b = torch.tensor(np.random.randint(0, 4, (30))).long()
loss = nn.CrossEntropyLoss()

print("a的维度:", a.size()) # torch.Size([30, 5])
print("b的维度:", b.size()) # torch.Size([30])
print(loss(a, b)) # tensor(1.6319, dtype=torch.float64)

How to Solve Pytorch DataLoader Loading Error: UnicodeDecodeError: ‘utf-8‘ codec can‘t decode byte 0xe5 in position 1023

The complete error reports are:

Traceback (most recent call last):
  File "/Applications/PyCharm.app/Contents/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_comm.py", line 301, in _on_run
    r = r.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 1023: unexpected end of data

 

Solution:

This is not to solve the problem of Unicode decodeerror: 'UTF-8' codec can't decode byte 0xe5 in position 1023: unexpected end of data , but to solve the problem that the model cannot be iterated. The method is as follows:

Replace the data source in tensor format with numpy format, then convert it to tensor , and finally put it into dataloader

Unicode decodeerror will still be reported when moving from numpy to tensor, but the loaded data will not be encapsulated in the dataloader, resulting in the stop of the data cycle and the training of the model will not be affected.

[Solved] modulenotfounderror: no module named ‘torchtext.legacy.data.datasets_ utils‘

Cause: pytorch version is too low## Title
Solution.
Step 1: Find the utils.py file inside the downloaded E:\anaconda\package\envs\pytorch_gpu\Lib\site-packages\d2lzh_pytorch installation package and open it.
Step 2: Change the import torchtext inside to import torchtext.legacy as torchtext
Step 3: Close jupyter and reopen it, successfully.

[Solved] ValueError: Connection error, and we cannot find the requested files in the cached path…

error:

self.tokenizer = CamembertTokenizer.from_pretrained(“camembert-base”)
resolved_vocab_files[file_id] = cached_path(
output_path = get_from_cache(
raise ValueError(
ValueError: Connection error, and we cannot find the requested files in the cached path. Please try again or make sure your Internet connection is on.

Solution.
When running the command line type

TRANSFORMERS_OFFLINE=1  python test.py
sh file is:
TRANSFORMERS_OFFLINE=1  \ python test.py

When running the code
Refer to the official website at
Reason for error reporting.

Firewalled environments
Some cloud and intranet setups have their GPU instances firewalled to the outside world, so if your script is trying to download model weights or datasets it will first hang and then timeout with an error message like:
ValueError: Connection error, and we cannot find the requested files in the cached path.
Please try again or make sure your Internet connection is on.
One possible solution in this situation is to use the “offline-mode”.

Solution: Offline mode

It’s possible to run 🤗 Transformers in a firewalled or a no-network environment.
Setting environment variable TRANSFORMERS_OFFLINE=1 will tell 🤗 Transformers to use local files only and will not try to look things up.
Most likely you may want to couple this with HF_DATASETS_OFFLINE=1 that performs the same for 🤗 Datasets if you’re using the latter.

RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784 torch

-- Process 6 terminated with the following error:
Traceback (most recent call last):
  File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
    fn(i, *args)
  File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/detectron2/engine/launch.py", line 108, in _distributed_worker
    raise e
  File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/detectron2/engine/launch.py", line 103, in _distributed_worker
    timeout=timeout,
  File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
    barrier()
  File "/media/home/intern/anaconda3/envs/torch17/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
    work = _default_pg.barrier()
RuntimeError: NCCL error in: /pytorch/torch/lib/c10d/ProcessGroupNCCL.cpp:784, invalid usage, NCCL version 2.7.8

The code requires 8 GPUs to run, while the machine has only two cards
so set the code to run with two cards.

ModuleNotFoundError: No module named ‘notebook‘

ModuleNotFoundError: No module named ‘notebook’

Problem modulenotfounderror: no module named ‘notebook’

This problem occurred when running notebook today. Now I’d like to share with you how to solve this problem

terms of settlement

    open the terminal: Win + R, enter “CMD”, then “enter”
    activate the environment when you run the code: “CONDA activate + your environment name”
    after entering your environment, enter “Python – M PIP install Jupiter”, and then “enter”
    appears at the bottom to indicate that the installation is successful
    then enter: IPython notebook
    . This page indicates that the problem has been solved

Solve pytorch multiprocess valueerror: error initializing torch.distributed using env: //rendezvou… Error

error message: ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set
Solution 1:
Use in the code

import os

os.environ['MASTER_ADDR'] = 'localhost'
os.environ['MASTER_PORT'] = '5678'

Solution 2:

If you are running the command line, you can use:

export MASTER_ADDR=localhost
export MASTER_PORT=5678

Undefined symbol: cblas appears after installing pytorch1.0.0_ sgemm_ Alloc error

Installation of pytorch1.0 encountered the following problems:

>>> import torch
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "xxxxx/site-packages/torch/__init__.py", line 84, in <module>
    from torch._C import *
ImportError: xxxxx/site-packages/torch/lib/libmkldnn.so.0: undefined symbol: cblas_sgemm_alloc

Someone on the Internet has solved this problem by opening ~ /. Bashrc and finding the declaration of such a variable:

export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH

Then comment it out and the problem is solved.

But I don’t have such a line when I open the bashrc file
after many searches, I have solved this problem as follows:

conda install -c anaconda mkl

This line of code, and then I import torch succeeded!

Hope to help you!

[How to Solve] RuntimeError: CUDA out of memory.

Problem description

Runtimeerror: CUDA out of memory. Tried to allocate 1.26 gib (GPU 0; 6.00 GiB total capacity; 557.81 MiB already allocated; 2.74 GiB free; 1.36 gib reserved in total by pytorch)

Solution:

GPU cache is not enough
you can reduce the size of batch size properly
if it works properly
1. Restart the computer
restart the computer and close the occupied GPU process, which is a solution
2. Kill process
input

nvidia-smi

View the process

and enter

kill -9 PID

Tensor for argument #2 ‘mat1‘ is on CPU, but expected it to be on GPU (while checking arguments for

Tensor for argument #2 ‘mat1’ is on CPU, but expected it to be on GPU (while checking arguments for addmm)
Both the model and the input data need to be moved to the device

model=NonLinearRegression().to(device)#Moudle
for batch_idx,(data,target) in enumerate(train_loader):
	data,target=data.to(device),target.to(device)
	...
for data,target in val_loader: 
	data,target=data.to(device),target.to(device)
	...

[Solved] Pytorch Tensor to numpy error: RuntimeError: Can‘t call numpy() on Tensor that requires grad.报错

Solution:

Use tensor. Detach(). Numpy() when turning numpy:

a = torch.ones(5)
b = a.detach().numpy()
print(b)

Problem analysis

When the tensor conversion in calculation, because it has gradient value, it cannot be directly converted to numpy format, so it is better to call . Detach(). Numpy() no matter how

[Solved] PyTorch Caught RuntimeError in DataLoader worker process 0和invalid argument 0: Sizes of tensors mus

The error is as follows:

Traceback (most recent call last):
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 819, in __next__
    return self._process_data(data)
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 75, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 75, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 65, in default_collate
    return default_collate([torch.as_tensor(b) for b in batch])
  File "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 16 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

In __ getitem__ function does get the data, so the problem lies in torch. Utils. Data. Dataloader

analysis

In fact, there are two mistakes

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 8 and 16 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Prompt for inconsistent data dimensions, jump toFile "/home/jiang/miniconda3/envs/Net/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate return torch.stack(batch, 0, out=out) Source file at :

  if isinstance(elem, torch.Tensor):
   out = None
   if torch.utils.data.get_worker_info() is not None:
       # If we're in a background process, concatenate directly into a
       # shared memory tensor to avoid an extra copy
       numel = sum([x.numel() for x in batch])
       storage = elem.storage()._new_shared(numel)
       out = elem.new(storage)
   return torch.stack(batch, 0, out=out)

It can be found that the dataloader needs to merge at the end. If the batchsize is set, then this is the process of batch merging. If the dimensions are not unified, an error will be reported.

Another error is to enable multi threading (Num)_ workers!= 0) prompt which thread has a problem. Because the dimensions of batch merge are different, the first thread will hang (worker process 0), so runtimeerror: caught runtimeerror in dataloader worker process 0. will be prompted

Solution:

Since the dimensions are not unified, it’s good to ensure that the dimensions are the same. You can set a large enough array or tent in advance, and mark the unfilled part. When you read the data, you can determine the valid data according to the mark.