Tag Archives: pytorch

[Solved] RuntimeError: DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

Question

RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:76] data. DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

Today, when running yoov7 on my own computer, I used the CPU to run the test model because I didn’t have a GPU. I used the CPU to predict an independent image. There is no problem running an image. It is very nice!!! However, when I predict a video (multiple images), he told me that the memory allocation was insufficient,

DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.,

Moreover, it does not appear after the second image is run. It appears when the 17th image is calculated. The memory can not be released several times later~~~~~~~~

analysis

In pytorch, a tensor has a requires_grad parameter, which, if set to True, is automatically derived when backpropagating the tensor. tensor’s requires_grad property defaults to False, and if a node (leaf variable: tensor created by itself) requires_grad is set to True, then all nodes that depend on it require_grad to be True (even if other dependent tensors have requires_grad = False). grad is set to True, then all the nodes that depend on it will have True (even if the other tensor’s requires_grad = False)

Note:

requires_grad is a property of the generic data structure Tensor in Pytorch, which is used to indicate whether the current quantity needs to retain the corresponding gradient information in the calculation. Taking linear regression as an example, it is easy to know that the weights w and deviations b are the objects to be trained, and in order to get the most suitable parameter values, we need to set a relevant loss function, based on the idea of gradient back propagation Perform training.

When requires_grad is set to False, the backpropagation is not automatically derivative, so it saves memory or video memory.

Then the solution to this problem follows, just let the model not record the gradient during the test, because it is not really used.

Solution:

Use with torch.no_grad(), let the model not save the gradient during the test:

with torch.no_grad():
    output, _ = model(image) # Add before the image calculation

In this way, when the model calculates each image, the derivative will not be obtained and the gradient will not be saved!

Perfect solution!

[ncclUnhandledCudaError] unhandled cuda error, NCCL version xx.x.x

Problem description

Problems encountered during distributed training

RuntimeError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:47, unhandled cuda error, NCCL version 21.0.3
ncclUnhandledCudaError: Call to CUDA function failed.

The specific errors are as follows:

Problem-solving

According to the analysis of error reporting information, an error is reported during initialization during distributed training, not during training. Therefore, the problem is located on the initialization of distributed training.

Enter the following command to check the card of the current server

nvidia-smi -L

The first card found is 3070

GPU 0: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 1: NVIDIA GeForce RTX 3070 (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 2: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 3: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 4: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 5: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 6: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)
GPU 7: NVIDIA GeForce RTX 2080 Ti (UUID: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx)

Therefore, here I directly try to use 2-7 cards for training.

Correct solution!

RuntimeError: stack expects each tensor to be equal size, but got [x] at entry 0 and [x] at entry 1

Problem description: When generating a dataloader, the training set can be run, but the test set has this error: RuntimeError: stack expects each tensor to be equal size, but got [200] at entry 0 and [116] at entry 1.

How to Solve: In generating the dataloader, I need to generate a dataset, so my error occurred because there is a minibatch in the dataset with a different number of data than the other minibatch, so I went into the custom dataset method to check, and through print debugging, I found that it was a problem with the dataset label.

Solution: Go into the dataset and print the output of the dataset.

[Solved] RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Background:

Use a graphics card in the ubuntu18.04 system geforce RTX 3090 to reproduce r2c

problem

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Cause analysis:

The graphics card geforce RTX 3090 only supports versions of cuda11 and above.

Solution:

Update pytorch and CUDA versions:

conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

[Solved] Yolov5 Deep Learning Error: RuntimeError: DataLoader worker (pid(s) 2516, 1768) exited unexpectedly

Project scenario:

There is a problem when using yolov5 for deep learning. I use GPU for learning.

Problem description

An error is reported at the beginning of learning, RuntimeError: DataLoader worker (pid(s) 2516, 1768) exited unexpectedly.

Cause analysis:

Because I use GPU to learn, Anaconda’s virtual memory is also allocated enough, so the problem should be the setting of the number of CPU threads. Before that, I tried to adjust the batch size, but it didn’t work.

Solution:

In train There is a parameter of --workers in the file of py.

There is a parameter named --workers in the train.py file. Set it to 0.

the following is my setting, you can refer to it~~~

def parse_opt(known=False):
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default=ROOT/'yolov5x.pt', help='initial weights path') #初始权重值
    parser.add_argument('--cfg', type=str, default='yolov5_Scan_FDDI/PLC_model.yaml', help='model.yaml path') #训练模型文件
    parser.add_argument('--data', type=str, default=ROOT/'yolov5_Scan_FDDI/PLC_parameter.yaml', help='dataset.yaml path') #数据集参数文件
    parser.add_argument('--hyp', type=str, default=ROOT/'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path') #超参数设置
    parser.add_argument('--epochs', type=int, default=100) #训练轮数
    parser.add_argument('--batch-size', type=int, default=4, help='total batch size for all GPUs, -1 for autobatch') #batch size
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=320, help='train, val image size (pixels)') #图片大小
    parser.add_argument('--rect', action='store_true', help='rectangular training')
    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training') #断续训练
    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
    parser.add_argument('--noval', action='store_true', help='only validate final epoch')
    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
    parser.add_argument('--noplots', action='store_true', help='save no plot files')
    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in "ram" (default) or "disk"')
    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
    parser.add_argument('--device', default='0', help='cuda device, i.e. 0 or 0,1,2,3 or cpu') #GPU
    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')
    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
    parser.add_argument('--workers', type=int, default=0, help='max dataloader workers (per RANK in DDP mode)') #CPU线程数设置
    parser.add_argument('--project', default=ROOT/'runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--quad', action='store_true', help='quad dataloader')
    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
    parser.add_argument('--seed', type=int, default=0, help='Global training seed')
    parser.add_argument('--local_rank', type=int, default=-1, help='Automatic DDP Multi-GPU argument, do not modify')

    # Weights & Biases arguments
    parser.add_argument('--entity', default=None, help='W&B: Entity')
    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')

    opt = parser.parse_known_args()[0] if known else parser.parse_args()
    return opt

[Solved] Pytorch Error: PytorchStreamReader failed reading zip archive failed finding central directory

Pytoch reports an error:

PytorchStreamReader failed reading zip archive: failed finding central directory

Error reporting position

An error is reported if the pre training model is not downloaded

resnet101 = torchvision.models.resnet101(pretrained=True)

Solution:

Delete the file C:\Users\Username/.cache\torch\hub\checkpoints.pth

[Solved] RuntimeError: cuda runtime error (801) : operation not supported at

cuda runtime error (801) : Raw out

Error:
RuntimeError: cuda runtime error (801) : operation not supported at C:\w\1\s\windows\pytorch\torch/csrc/generic/StorageSharing.cpp:245 #85

Reason:
Guess, windows does not support multitasking

Solution:

    layer_loader = NeighborSampler(data.adj_t, node_idx=None, sizes=[-1], batch_size=4096, shuffle=False, num_workers=12)

For example, the above code

Delete numwork directly

layer_loader = NeighborSampler(data.adj_t, node_idx=None, sizes=[-1], batch_size=4096, shuffle=False)

[Solved] ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

1. question

Using pytorch dataloader in docker may cause the following errors:

2. solution

View disk usage through df -h in docker:

You can see that /dev/shm is only 64M, but the data_loader has more num_works set, and it is collaborating through shared memory, resulting in insufficient memory.

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with –ipc=host or –shm-size command line options to nvidia-docker run.

Solution:
(1) num_workers=0 (note that setting it to 1 does not work)
(2) docker is easy to share more memory:

--ipc=host  or --shm-size 8G
where -ipc=host will be adjusted according to the current host memory maximum, it is recommended to use this method

After restart:

OSError: [WinError 1455] The page file is too small to complete the operation. Error loading…

Complete error oserror: [winerror 1455] the page file is too small to complete the operation. Error loading “C:\ProgramData\Anaconda3\lib\site-packages\torch\lib\shm.dll” or one of its dependencies.

Scenario: Running the reid-strong-baseline model

Reason: The model is too large, and the system allocated paging memory is too small to train

Environment: windows10, cuda version: 11.1, pytorch version: 1.11.0+cu113

(1) Query your CUDA version:

nvidia-smi

(2) Query your own version of pytorch

import torch
print(torch.__version__)

Solution: Right-click Properties->Advanced System Settings->Advanced->Settings->Advanced->Programs->Change->Uncheck “Automatically manage…” (Define initial size and maximum size) (set here according to the actual available space, as large as possible) -> click “Settings” -> OK -> reboot

If the error is still reported after reboot, the possible reasons are: (1) the custom size is still too small (for example, I set 10G at the beginning, but still reported an error, and subsequently modified to 100G (100000M) to run successfully) (2) the batch_size is too large, you can adjust the size appropriately (for example, reduce 64 to 16)

[Solved] Visdom Error: raise ConnectionError

Use visdom to visualize loss in the network. First, enter in the terminal:

python -m visdom.server

Then run the code, the result is an error, and the following information appears:

 raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: 
HTTPConnectionPool(host='localhost', port=8097): 
Max retries exceeded with url: /events (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6734182d50>: 
Failed to establish a new connection: 
[Errno 111] Connection refused'))

After Baidu, I found the solution:

Finished running the command: python -m visdom.serve, the following sentence will appear:

Setting up a new session...

To keep the terminal on this page, create a new terminal and run the network code in the new terminal.

Then open the link in the browser:

http://localhost:8097/

You can see the curve being drawn.

[Solved] RuntimeError: Error(s) in loading state_dict for BertForTokenClassification

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification

problem：
RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:size mismatch for bert.embeddings.word_embeddings.weight: copying a param with shape torch.Size([21128, 768]) from checkpoint, the shape in current model is torch.Size([119547, 768]).

Solution：
The parameters of torch are not consistent with the mod
My original code was

model = AutoModelForTokenClassification.from_pretrained("bert-base-multilingual-cased", num_labels=len(label_names))

Just reinstall pytorch

conda install pytorch==1.7.1

[Solved] TVM operate error: TVMError: AssertionError

TVM operate error: TVMError: AssertionError

Traceback (most recent call last):
  File "tune_relay_x86.py", line 248, in <module>
    tune_and_evaluate(tuning_option)
  File "tune_relay_x86.py", line 241, in tune_and_evaluate
    lib = relay.build_module.build(mod, target=target, params=params)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/relay/build_module.py", line 468, in build
    graph_json, runtime_mod, params = bld_mod.build(
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/relay/build_module.py", line 196, in build
    self._build(mod, target, target_host, executor, runtime, workspace_memory_pools, mod_name)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  30: TVMFuncCall
  29: tvm::relay::backend::RelayBuildModule::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
  28: tvm::relay::backend::RelayBuildModule::BuildRelay(tvm::IRModule, tvm::runtime::String const&)
  27: tvm::relay::backend::RelayBuildModule::OptimizeImpl(tvm::IRModule)
  26: tvm::transform::Pass::operator()(tvm::IRModule) const
  25: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  24: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  23: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  22: tvm::relay::transform::FunctionPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  21: _ZN3tvm7runtime13PackedFuncObj9ExtractorINS0_16PackedFuncSubObjIZNS0_15TypedPackedFuncIFNS_5relay8FunctionES6_NS_8IRModuleENS_9transform11PassContextEEE17AssignTypedLambdaIZNS5_9transform13AlterOpLayoutEvEUlS6_S7_S9_E_EEvT_EUlRKNS0_7TVMArgsEPNS0_11TVMRetValueEE_EEE4CallEPKS1_SG_SK_
  20: tvm::relay::alter_op_layout::AlterOpLayout(tvm::RelayExpr const&)
  19: tvm::relay::ForwardRewrite(tvm::RelayExpr const&, tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::relay::Call const&, tvm::runtime::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)> const&, std::function<tvm::runtime::ObjectRef (tvm::relay::Call const&)>, std::function<tvm::RelayExpr (tvm::RelayExpr const&)>)
  18: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
  17: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
  16: _ZN3tvm5relay16MixedModeMutator17DispatchVisitExprERKNS_9Re
  15: tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  14: tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  13: _ZZN3tvm5relay11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlR
  12: tvm::relay::ExprMutator::VisitExpr_(tvm::relay::FunctionNode const*)
  11: tvm::relay::MixedModeMutator::VisitExpr(tvm::RelayExpr const&)
  10: tvm::relay::MixedModeMutator::VisitLeaf(tvm::RelayExpr const&)
  9: _ZN3tvm5relay16MixedModeMutator17DispatchVisitExprERKNS_9Re
  8: tvm::relay::ExprMutator::VisitExpr(tvm::RelayExpr const&)
  7: tvm::relay::ExprFunctor<tvm::RelayExpr (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)
  6: _ZZN3tvm5relay11ExprFunctorIFNS_9RelayExprERKS2_EE10InitVTableEvENUlR
  5: tvm::relay::MixedModeMutator::VisitExpr_(tvm::relay::CallNode const*)
  4: tvm::relay::ForwardRewriter::Rewrite_(tvm::relay::CallNode const*, tvm::RelayExpr const&)
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::RelayExpr (tvm::relay::Call const&, tvm::runtime::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)>::AssignTypedLambda<tvm::RelayExpr (*)(tvm::relay::Call const&, tvm::runtime::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)>(tvm::RelayExpr (*)(tvm::relay::Call const&, tvm::runtime::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&))::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::RelayExpr tvm::relay::LayoutRewriter<tvm::relay::alter_op_layout::AlterTransformMemorizer>(tvm::relay::Call const&, tvm::runtime::Array<tvm::RelayExpr, void> const&, tvm::runtime::ObjectRef const&)
  1: tvm::relay::alter_op_layout::AlterTransformMemorizerNode::CallWithNewLayouts(tvm::relay::Call const&, tvm::Attrs, std::vector<tvm::RelayExpr, std::allocator<tvm::RelayExpr> > const&)
  0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/relay/op/nn/_nn.py", line 112, in alter_op_layout_dense
    return topi.nn.dense_alter_layout(attrs, inputs, tinfos, out_type)
  File "/home/lizhenxu/anaconda3/lib/python3.8/site-packages/decorator.py", line 231, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/target/generic_func.py", line 286, in dispatch_func
    return dispatch_dict[k](*args, **kwargs)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/topi/x86/dense_alter_op.py", line 51, in _alter_dense_layout
    _, outs = relay.backend.te_compiler.select_implementation(
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/relay/backend/te_compiler.py", line 201, in select_implementation
    outs = impl.compute(attrs, inputs, out_type)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/relay/op/op.py", line 126, in compute
    return _OpImplementationCompute(self, attrs, inputs, out_type)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
  3: TVMFuncCall
  2: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::relay::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#4}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  1: tvm::relay::OpImplementation::Compute(tvm::Attrs const&, tvm::runtime::Array<tvm::te::Tensor, void> const&, tvm::Type const&)
  0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/relay/op/strategy/generic.py", line 833, in _compute_dense
    return [topi_compute(*args)]
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/autotvm/task/topi_integration.py", line 164, in wrapper
    cfg = DispatchContext.current.query(tgt, workload)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/autotvm/task/dispatcher.py", line 76, in query
    ret = self._query_inside(target, workload)
  File "/home/lizhenxu/ZJJ/tvm_0.9/tvm/python/tvm/autotvm/task/dispatcher.py", line 421, in _query_inside
    assert wkl == workload
TVMError: AssertionError

Run the official website code tune_relay_x86.py report the error above.

Solution: delete the .log file that was generated by running this code before

ProgrammerAH

Programmer Guide, Tips and Tutorial

Tag Archives: pytorch

[Solved] RuntimeError: DefaultCPUAllocator: not enough memory: you tried to allocate 1105920 bytes.

[ncclUnhandledCudaError] unhandled cuda error, NCCL version xx.x.x

RuntimeError: stack expects each tensor to be equal size, but got [x] at entry 0 and [x] at entry 1

[Solved] RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

[Solved] Yolov5 Deep Learning Error: RuntimeError: DataLoader worker (pid(s) 2516, 1768) exited unexpectedly

[Solved] Pytorch Error: PytorchStreamReader failed reading zip archive failed finding central directory

[Solved] RuntimeError: cuda runtime error (801) : operation not supported at

[Solved] ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

OSError: [WinError 1455] The page file is too small to complete the operation. Error loading…

[Solved] Visdom Error: raise ConnectionError

[Solved] RuntimeError: Error(s) in loading state_dict for BertForTokenClassification

[Solved] TVM operate error: TVMError: AssertionError