[Solved] Python Error: An attempt has been made to start a new process before the current process has finished …

This error usually occurs in Windows systems using multiple processes. For example, execute the following code in pychar:

import torch
import torch.utils.data as Data
import numpy as np
from sklearn.datasets import load_iris

iris_x, irisy = load_iris(return_X_y=True)
print("iris_x.dtype:", iris_x.dtype)
print("irisy:", irisy.dtype)

## transform the training set x into a tensor, and the training set y into a tensor
train_xt = torch.from_numpy(iris_x.astype(np.float32))
train_yt = torch.from_numpy(irisy.astype(np.int64))
print("train_xt.dtype:", train_xt.dtype)
print("train_yt.dtype:", train_yt.dtype)

## After converting the training set into a tensor, use TensorDataset to collate X and Y together
train_data = Data.TensorDataset(train_xt, train_yt)
## Define a data loader to batch the training dataset
train_loader = Data.DataLoader(
    dataset=train_data, ## the dataset to use
    batch_size=10, # # Batch sample size
    shuffle=True, # Break up the data before each iteration
    num_workers=2, # [Note: 2 processes are used here]
)

## Check if the dimensionality of the samples of a batch of the training dataset is correct
for step, (b_x, b_y) in enumerate(train_loader):
    if step > 0:
        break
## Output the dimensions of the training image and the labels, and the data type
print("b_x.shape:", b_x.shape)
print("b_y.shape:", b_y.shape)
print("b_x.dtype:", b_x.dtype)
print("b_y.dtype:", b_y.dtype)


## --------- -The correct result is as follows -------- --

# iris_x.dtype: float64
# irisy: int32
# train_xt.dtype: torch.float32
# train_yt.dtype: torch.int64
# b_x.shape: torch.Size([10, 4])
# b_y.shape: torch.Size([10])
# b_x.dtype: torch.float32
# b_y.dtype: torch.int64

The following errors will be reported. (no error will be reported when running in jupyter notebook under the same environment. I don’t know why…)

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

 

Solution 1:

Remove the statement setting up multiple processes. In this example, comment or delete the following line.

num_workers=2,  # [Note: 2 processes are used here]

Solution 2:

Move the code part of calling multiple processes to [if _name_ = = ‘_main_’:].

if __name__ == '__main__':
    ##  Check if the dimensionality of the samples of a batch of the training dataset is correct
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
        ## Output the dimensions of the training image and the dimensions of the labels, and the data type
    print("b_x.shape:", b_x.shape)
    print("b_y.shape:", b_y.shape)
    print("b_x.dtype:", b_x.dtype)
    print("b_y.dtype:", b_y.dtype)

However, in pychart, the part before [for step, (b_x, b_y) in enumerate (train_loader):] will be executed twice.

## ——————————The result of running in Pycharm is as follows——————————
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
b_x.shape: torch.Size([10, 4])
b_y.shape: torch.Size([10])
b_x.dtype: torch.float32
b_y.dtype: torch.int64

Read More: