Parallel processing in Python

Recently, Python is used to process the public image database. Due to the large amount of data, it takes too long to process the images one by one in serial. Therefore, it is decided to adopt the parallel mode to make full use of the CPU on the host to speed up the processing process, which can greatly reduce the total processing time.

Here we use the concurrent.futures Module, which can use multiprocessing to achieve real parallel computing.

The core principle is: concurrent.futures It will run multiple Python interpreters in parallel in the form of sub processes, so that Python programs can use multi-core CPU to improve the execution speed. Since the subprocess is separated from the main interpreter, their global interpreter locks are also independent of each other. Each subprocess can use a CPU core completely.

The specific implementation is also very simple, the code is as follows. How many CPU cores the host has will start how many Python processes for parallel processing.

import concurrent.futures


def function(files):
    # To do what you want
    # files: file list that you want to process


if __name__ == '__main__':
    with concurrent.futures.ProcessPoolExecutor() as executor:
        executor.map(function, files)

After changing to parallel processing, my 12 CPUs run at full load, and the processing speed is significantly faster.

ProgrammerAH

Programmer Guide, Tips and Tutorial

Tag Archives: Parallel processing in Python

Python parallel processing makes full use of CPU to realize acceleration