background

I'm developing an application server for sometphone games with Python and Django, and I'm using uWSGI as its web server. uWSGI has too many parameters and it is difficult to know which one to choose.

I actually did a load test, changed various parameters, and now I can get stable performance. I would like to share how I measured it at that time and what parameters I changed.

environment

python: 2.7.10
uWSGI: 2.0.11.1
locust: Load testing tool. We created user scenarios and tested various API loads on the application server.
EC2: c4.2xlarge 8 cores

Performance measurement tool

"Don't guess, measure"

There is a word, but if you can not measure it no matter what, you do not know if it has improved. The following tools were mainly useful this time.

New Relic: You're familiar with it. Because it is a paid version, you can see the details
uwsgitop: uWSGI performance statistics tool that can be introduced with PyPI
vmstat

About uwsgitop

Performance statistics data can be output to socket as follows in the settings on the uWSGI side.

stats = /var/run/uWSGI/projectname.stats.sock
memory-report = true

If you execute uWSGItop installed with pip etc. as follows, you can see the statistics in real time like the unix command top.

$ uwsgitop /var/run/uWSGI/projectname.stats.sock

The following is an execution example. In this state, it is in process units, but you can also see it in thread units by pressing a on the keyboard.

UWSGI settings that are effective for performance tuning set this time

uWSGI Options has values for various configuration files, but there are too many settings to set. It's hard to know at first if you can do it. In addition, the explanation of the options is simple, and the author of uWSGI himself asks to see the source, so it will be difficult if you have never used it.

Here, I would like to raise the parameters that were effective when I actually set them.

processes, threads

uWSGI can specify the number of processes and threads to accept requests respectively. The uWSGI we are developing has the following settings.

processes = 16
threads = 1

I tested it with various combinations of processes and threads, but when I increased the number of threads, the number of context switches increased and the RPS handled was less than half.

By the way, when threads is 1, even if processes are increased, there is not much difference when the number of cores is 8 or more.

thunder-lock

If you are using a Linux server, you should first set this option to true as follows.

thunder-lock = true

For details, see here, but if it is false, it is a request to handle in multiple uWSGI processes. Will be biased.

max-requests, max-requests-delta

max-requests is a setting for how many requests a process will receive before reloading. It is set as follows.

max-requests = 6000

At the beginning of the load test, this value was a large value, it was not reloaded, and the memory usage status was as follows.

Looking at how the memory of the uWSGI process increases, 6000 is appropriate, so I set it that way. I think it is necessary to investigate why the memory increases, but since it is reloaded once every 6000 times, it is judged that there is no effect and it is set. Reloading frees the memory and returns it to the same state it was in when it was first started.

I forgot to take each process, but if it is set correctly, the memory will settle down in a certain period as shown below.

However, note that when the process restarts, it will not be able to receive new requests for a few seconds. If you apply the same value of max-requests to all processes, they will restart almost all at once, and the entire server will not receive requests for a few seconds.

To avoid this, the value max-requests-delta is set as follows.

max-requests-delta = 300

It will restart with this difference for each process. For example, if one process restarts at 6000, the next process will restart when it receives 6300 requests. Assuming that the RPS is about 16 and the process is set to 16, one process receives one request every second, so it restarts every 300 seconds. By doing this, the process of restarting at one time is limited, and it is possible to avoid that the server as a whole cannot receive requests.

touch-reload, lazy-apps

You can specify the file path in touch-reload and reload uWSGI every time the setting is touched. However, if you do so, the entire server will not be able to receive requests, so you can set lazy-apps so that you can reload sequentially.

In our environment, if lazy-apps is specified, it is not suitable for updating immediately, so waiting for a few seconds is not allowed and set.

Summary

This time, we have seen the parameters of uWSGI that were actually tuned. In addition, the following parameters have been set, but I would like to continue to investigate further and improve performance.

listen
harakiri
harakiri-verbose
limit-as
log-date