Executing a large number of Python3 Executor.submit may consume a lot of memory

Referenced: https://www.bettercodebytes.com/theadpoolexecutor-with-a-bounded-queue-in-python/


You can adjust the degree of parallelism by specifying max_workers in the Executor, but what happens if you submit at a pace that exceeds the number of parallelism does not block. Instead, it seems to store in memory. Due to this behavior, running in large numbers can consume a lot of memory.

with ThreadPoolExecutor(max_workers=10) as executor:
    for i in range(0, 1024*1024):  #Lots
        executor.submit(fn, i)     #to make
    #The for loop ends soon, but the memory consumption is supposed to be awesome

In fact, writing code that loops 1 million consumes about 2GB of memory. So I decided to think about how to deal with it.

Internal implementation and cause

After checking the internal implementation, ThreadPoolExecutor has a queue internally, Submit creates an object called WorkItem and puts it in the queue. This internal queue has no upper limit and can never be blocked, so you can submit endlessly.

By the way, the Worker thread is created at the timing of queuing, [Worker thread fetches data from the queue and executes it in an infinite loop](https://github.com/python/cpython/ blob / v3.8.6 / Lib / concurrent / futures / thread.py # L66).

Confirmation code

Let's actually observe the movement. For example, execute a function that takes 0.01 seconds 5000 times. Let's turn this with max_workers = 10.

Look at the timestamp and memory (maxrss this time) as progress in the for statement.


From the time stamp, you can see that blocking has not occurred in submit (the process of submitting the for loop finishes immediately and is almost waiting for shutdown). However, you can see that the memory consumption increases as the process progresses.


Proposal 1. Make the queue sized

This is the first method I thought of. Makes the queue used inside ThreadPoolExecutor a sized queue. Inherit and replace instance variables.


You can see from the timestamp that blocking is occurring during the loop. However, the total time does not change much and the memory consumption is very slow.

The code is simple, but it feels a bit sloppy to get involved in the internal implementation, and ProcessPoolExecutor doesn't have these queues, so this method doesn't work.

Proposal 2. Control the number of concurrent executions with Semaphore

Since Plan 1 was not good enough, I was looking for something I could do and found the reference article.


Create a class BoundedExecutor that wraps PoolExecutor by referring to the reference source. Since it is API compatible (other than map), it can be replaced and used.

The internal implementation controls the concurrency by counting down the semaphore at the time of submit and counting up the semaphore when the worker processing is completed. "When the worker processing is completed" is "when the function registered by add_done_callback of future is called when it is completed". (The callback will be called when the worker's processing is completed and when a raise Exception occurs, so it should be correct.)


This also gave the same result as in Plan 1.

By the way, it is better to decide the size of the queue so that it is larger than max_workers (in the code, give or change the argument so that bounded_ratio = 1 becomes bounded_ratio = 2) If you set "Number of parallels == Queue size", there will be a timing when the queue becomes empty, workers will play, and the overall completion will be slightly delayed. Therefore, it is better to increase it a little.


Recommended Posts

Executing a large number of Python3 Executor.submit may consume a lot of memory
[Python] Randomly generate a large number of English names
Connect a large number of videos together!
Connect a lot of Python or and and
ETL processing for a large number of GTFS Realtime files (Python edition)
Upload a large number of images to Wordpress
Organize a large number of files into folders
Accelerate a large number of simple queries with MySQL
[Python] A program that counts the number of valleys
Consolidate a large number of CSV files in folders with python (data without header)
Python that merges a lot of excel into one excel
Python + selenium to GW a lot of e-mail addresses
Lambda + Python is good at restricting access with a large number of IP address lists
Paste a large number of image files into PowerPoint [python-pptx]
I want to start a lot of processes from python
Get the number of specific elements in a python list
[Blender x Python] Let's arrange a lot of Susanne neatly !!
Scrapy-Redis is recommended for crawling a large number of domains
How to check the memory size of a variable in Python
TensorFlow To learn from a large number of images ... ~ (almost) solution ~
I made a lot of files for RDP connection with Python
How to check the memory size of a dictionary in Python
Convert a large number of PDF files to text files using pdfminer
[Python] Correlation is below a certain level ・ Maximum number of features
I did a lot of research on how Python is executed
Get the number of readers of a treatise on Mendeley in Python
I want to solve the problem of memory leak when outputting a large number of images with Matplotlib
A record of patching a python package
A good description of Python decorators
[Python] A memorandum of beautiful soup4
A brief summary of Python collections
Law of large numbers in python
[Python] A program that calculates the number of chocolate segments that meet the conditions
Memory usage of GC target object may increase by 8 bytes from Python 3.7.4
Get the number of searches with a regular expression. SeleniumBasic VBA Python
[Python] A program that calculates the number of socks to be paired
[Python] How to put any number of standard inputs in a list
Check the in-memory bytes of a floating point number float in Python
TensorFlow To learn from a large number of images ... (Unsolved problem) → 12/18 Solved