[root@bsyocr server-train]# tail trainall210722_6.log.txt
File “/home/server-train/pytorch_pretrained/modeling.py”, line 300, in forward
mixed_query_layer = self.query(hidden_states)
File “/usr/local/lib64/python3.6/site-packages/torch/nn/modules/module.py”, line 547, in __call__
result = self.forward(*input, **kwargs)
File “/usr/local/lib64/python3.6/site-packages/torch/nn/modules/linear.py”, line 87, in forward
return F.linear(input, self.weight, self.bias)
File “/usr/local/lib64/python3.6/site-packages/torch/nn/functional.py”, line 1371, in linear
output = input.matmul(weight.t())
RuntimeError: cublas runtime error : resource allocation failed at /pytorch/aten/src/THC/THCGeneral.cpp:216
RuntimeError: CUDA out of memory. error when running the model Checked a lot of related content, the reason is: GPU memory memory is not enough Briefly summarize the solution: change the batch_size to a smaller size
Modify the pad.size of bert from 2048 -> 1024
Read More:
- [Solved] RuntimeError: cublas runtime error : unknown error at C:/w/b/win…cu:225
- [Solved] RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at
- [Solved] RuntimeError: cuda runtime error (801) : operation not supported at
- [Solved] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemm
- [Solved] RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cubla…
- [Solved] bushi RuntimeError: version_ <= kMaxSupportedFileFormatVersion INTERNAL ASSERT FAILED at /pytorch/caffe2/s
- [Solved] RuntimeError: function ALSQPlusBackward returned a gradient different than None at position 3, but t
- RuntimeError: stack expects each tensor to be equal size, but got [x] at entry 0 and [x] at entry 1
- [Solved] RuntimeError: cuda runtime error: device-side assert trigger
- [Solved] pytorch loss.backward() Error: RuntimeError: Function AddBackward0 returned an invalid gradient at index 1…
- How to Solve RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu
- [Solved] RuntimeError (note: full exception trace is shown but execution is paused at: <module>)
- [Solved] RuntimeError: The size of tensor a (4) must match the size of tensor b (3) at non-singleton dim
- [Solved] pycuda._driver.LogicError: cuFuncSetBlockShape failed: invalid resource handle
- fatal error: cublas_v2.h: No such file or directory [How to Solve]
- [Solved] torchsummary Error: RuntimeError: Input type (torch.cuda.FloatTensor) and weight type (torch.F
- [Solved] python tqdm raise RuntimeError(“cannot join current thread“) RuntimeError: cannot join current thr
- RuntimeError: CUDA error: an illegal memory access was encountered
- How to Solve Error: RuntimeError CUDA out of memory
- [Solved] PyTorch Caught RuntimeError in DataLoader worker process 0和invalid argument 0: Sizes of tensors mus