[Solved] ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memor

1. question

Using pytorch dataloader in docker may cause the following errors:

2. solution

View disk usage through df -h in docker:

You can see that /dev/shm is only 64M, but the data_loader has more num_works set, and it is collaborating through shared memory, resulting in insufficient memory.

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with –ipc=host or –shm-size command line options to nvidia-docker run.

Solution:
(1) num_workers=0 (note that setting it to 1 does not work)
(2) docker is easy to share more memory:

--ipc=host  or --shm-size 8G
where -ipc=host will be adjusted according to the current host memory maximum, it is recommended to use this method

After restart:

 

Read More: