background

Launch multiple processes that take several days on a small server with 4 CPU memory and 4G When I was calculating with 100% CPU x 4, I ran out of memory in a few hours. CPU usage has dropped sharply. Apparently it was running out of memory.

What was happening

Increased processing time Nitomonai, increased process memory usage → Memory exhaustion → Memory swap occurs → Memory access speed deteriorates significantly → Memory waiting CPU usage is 1% or less → It never ends (significant deterioration in performance)

Measure 0. Solved with money

In most cases, increasing the number of servers or improving the performance of the server will solve the problem. I didn't select it because it was a personal PJ, but when I switched to a server with about 32G of memory I wonder if that was the only solution.

Countermeasure 1. Vertical division of processing

Memory management of python is fully automatic, leaving it to the basic VM. The only solution to a memory leak is to kill the process Idempotent processing was given, and the part where processing of 8 types of categories was performed with one command was divided.

Before improvement

class Category(Enum):
    A = 1
    B = 2
    C = 3

for category in Category:
  benchmark(category)

After improvement

category = manage.get_category_by_priority()
benchmark(category)

Countermeasure 2. Introduced supervisor

The code for improvement 1 had to be executed in 8 steps, so if the process stopped It was necessary to start up again. It is convenient to use supervisor in such a case

`shell`


easy_install supervisor
echo_supervisord_conf > /etc/supervisord.conf
supervisord
supervisord status
alias sc='supervisorctl'
sc restart
sc reread
sc stop all
sc status
sc restart all

Countermeasure 3. I got my hands on manual GC

I'm not familiar with python's GC, so it may have side effects. So far, the memory leak has been resolved and it is stable. There is a possibility of black magic, so I can't recommend it very much.

If you use a lot of class cache, memory leaks occur frequently in python2 series. Will python3 solve the problem of steadily increasing memory consumption?

class Category(Enum):
    A = 1
    B = 2
    C = 3

for category in Category:
  benchmark(category)

def benchmark(category):
  bulk = []
  tmp_data = Tmp.get_all()
  for _tmp in tmp_data:
    bulk.append(calc(_tmp))
  DBTable.bulk_create(bulk)  #Bulk!

  #Memory release
  import gc
  del tmp_data
  del bulk
  gc.collect()

Reference: gc — Garbage collector interface http://docs.python.jp/2/library/gc.html

Recommended Posts

What I did to save Python memory

What I did when updating from Python 2.6 to 2.7

What I did to welcome the Python2 EOL with confidence

What I did with a Python array

What I was addicted to Python autorun

What I did when I wanted to make Python faster -Numba edition-

[At Coder] What I did to reach the green rank in Python

What I did to ssh to the VPS Ubuntu environment

What I learned in Python

What I did to get started with Linux commands