ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor
1. question
Using pytorch dataloader in docker may cause the following errors:
2. solution
View disk usage through df -h in docker:
You can see that /dev/shm is only 64M, but the data_loader has more num_works set, and it is collaborating through shared memory, resulting in insufficient memory.
Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with –ipc=host or –shm-size command line options to nvidia-docker run.
Solution:
(1) num_workers=0 (note that setting it to 1 does not work)
(2) docker is easy to share more memory:
--ipc=host or --shm-size 8G
where -ipc=host will be adjusted according to the current host memory maximum, it is recommended to use this method
After restart:
Read More:
- [Solved] PyTorch Caught RuntimeError in DataLoader worker process 0和invalid argument 0: Sizes of tensors mus
- [Solved] Yolov5 Deep Learning Error: RuntimeError: DataLoader worker (pid(s) 2516, 1768) exited unexpectedly
- PIP Install Caused by SSLError(SSLError(1, ‘[SSL: WRONG_VERSION_NUMBER] wrong version number (_ssl.c:1123)’))
- When sending HTTP request, python encountered: error 54, ‘connection reset by peer’ solution
- [Solved] RuntimeError: unexpected EOF, expected 73963 more bytes. The file might be corrupted.
- Gunicorn timeout error: [1] [critical] worker timeout [How to Solve]
- [Solved] ERROR PythonRunner: Python worker exited unexpectedly (crashed)
- ConfigParser.InterpolationSyntaxError: ‘%‘ must be followed by ‘%‘ or ‘(‘, found: “%&‘“
- Tensorflow import Error: ImportError: libcuda.so.1: cannot open shared object file: No such file or dire
- [Solved] Sudo doesn‘t work: “/etc/sudoers is owned by uid 1000, should be 0”
- [Solved] pychar Submit gitlab Error: LF would be replaced by CRLF
- [Solved] Jupyter Notebook Error: SparkException: Python worker failed to connect back
- How to Fix Errors encountered in executing Python scripts with command line parameters
- You can run the Ansible Playbook in Python by hand
- [Solved] Pycharm paddle Error: Error: (External) CUDA error(35), CUDA driver version is insufficient for CUDA
- RuntimeError: CUDA error: an illegal memory access was encountered
- ModuleNotFoundError: No module named ‘tensorflow.python’ And the pits encountered after installation
- RuntimeWarning: overflow encountered in ubyte_Scalars pixel addition and subtraction overflow exception
- Copy the python3 installation package to Linux and run it. Error while loading shared libraries: libpython3.6. M.so. 1.0
- [Solved] ParserError: NULL byte detected. This byte cannot be processed in Python‘s native csv library