Recently, Python is used to process the public image database. Due to the large amount of data, it takes too long to process the images one by one in serial. Therefore, it is decided to adopt the parallel mode to make full use of the CPU on the host to speed up the processing process, which can greatly reduce the total processing time.
Here we use the concurrent.futures Module, which can use multiprocessing to achieve real parallel computing.
The core principle is: concurrent.futures It will run multiple Python interpreters in parallel in the form of sub processes, so that Python programs can use multi-core CPU to improve the execution speed. Since the subprocess is separated from the main interpreter, their global interpreter locks are also independent of each other. Each subprocess can use a CPU core completely.
The specific implementation is also very simple, the code is as follows. How many CPU cores the host has will start how many Python processes for parallel processing.
import concurrent.futures
def function(files):
# To do what you want
# files: file list that you want to process
if __name__ == '__main__':
with concurrent.futures.ProcessPoolExecutor() as executor:
executor.map(function, files)
After changing to parallel processing, my 12 CPUs run at full load, and the processing speed is significantly faster.
div>