[PYTHON] Detailed explanation Monitoring & performance improvement with NewRelic-Part 1

Overview

I've been using New Relic in a production environment + development environment for about a year and a half, and I've accumulated knowledge, so I'd like to share it. The features described here include the paid version.

Detailed explanation NewRelic monitoring & performance improvement-Part 1 Detailed explanation NewRelic monitoring & performance improvement-Part 2

What is New Relic

overview.png

New Relic is a tool used to monitor the status of servers or applications and improve their performance. There are various existing monitoring tools such as zabbix, ganglia, nagios, munin, but I think that New Relic is superior because of its ease of use. Once installed, you will have all the metrics you need to run your application on a server. With existing monitoring tools, users who install it must define and configure the required metrics, and in some cases write scripts. In that respect, New Relic has a very rich default metric and is arranged for easy viewing, and in many cases it is sufficient without any additional settings. There are many plugins that can be easily installed even if they are not in the existing metrics.

In this article, I'd like to introduce New Relic's features that everyone knows, as well as the detailed features of the paid version, by case study.

Prerequisite environment

We are developing an application server for smartphone games. Below is an environmental diagram of an application taking metrics in New Relic.

server.png

And it is a very general configuration

Now let's start a server engineer case study: sunny:

Case 1: Application is slow but can't be faster?

I think many people will tell you. It's too rough to understand, so let's start by defining the problem. Which API is slow? When did you get late? Is it slow only in certain cases? What about other users? There are many, but I will be specific about what is wrong. As a result, the following problems were found.

Now that it's pretty specific, let's take a look at New Relic. Select the application in APPS and then press the item called Transactions. If you select "Slowest average response time", you can see the slowest API in order of response time. (Note: It's slow overall, but I haven't tuned the server yet before the performance improvement ...)

gacha_transaction.png

If you click on gacha there, you will see the following page on the right side.

gacha_select2.png

By looking at this page, you can see the following.

When searching for a bottleneck by combining both the content of the problem definition and the source code, the following points were problems.

By taking the above measures, the API became faster and the response time became no problem. In this way, New Relic helps improve performance by getting the number of SQL issued for each API, the number of queries issued to Redis, and the execution time of python code.

However, it should be noted here that Avg time is just the time it took, so if the server CPU is full, the python code cannot be processed and the code time will be long.

Case 2: MySQL seems to be slow, but what's the cause?

It seems like a case to explain New Relic's MySQL entry, but it can happen. When the number of users increases, the CPU of the App server is about 50% and it seems that there is no problem, but if something slows down, it may be due to MySQL or Redis.

When I'm not sure, I'm looking at the basic overview

mysql_slow.png

It seems that MySQL is slow. In such a case, select Databases in the left pane and take a look at the following items for the time being. (Note: It's another time, just an introduction)

databases.png

Since the response time for each query is known, it is obvious that the query is slow. This is often the case with patterns that were fast when the data history was low, but slow when the data history was high. In some cases, you may have forgotten to paste the index, or you may not need the history in the first place, so fix that. Clicking on each query will also show you which API is issuing that query, which will help you improve.

This item can only be viewed on a query-by-query basis, but if you want to see the metrics of the MySQL server itself, the MySQL plugin is recommended. (Note: It's another time, just an introduction)

http://newrelic.com/plugins/new-relic-platform-team/52

After installing, you can see the following items.

plugin_1.png plugin_2.png plugin_3.png

Case 3: CPU is full, but what's happening?

Sometimes you want to know which process is occupying the CPU, but the standard metrics in the AWS console don't tell you at all. Even in such a case, New Relic can be taken with the agent installed by default.

Select SERVERS from the top pane. Select Processes from the left pane to see which processes are using the most CPU.

servers_1.png servers_2.png

Case 4: I want to know when I deployed

You can mark the time of deployment by throwing an event in New Relic's Web API.

deploy.png

Case 5: After release

It's a case I'm not sure about, but it's a story that New Relic was installed only on the production server. New Relic is only effective if you put it on the development server during the development period. A common pattern is that even if it works fast in development, its performance deteriorates in production. I don't know if SQL queries are slow because the development environment isn't overloaded. In such a case, put New Relic in the development environment and monitor the number of SQL issues for each API. Of course, you can do a load test for each release, but in many cases it is not realistic in terms of cost and time. It costs a little monthly, but I think you can pay enough.

Conclusion

So far, we have summarized the basic usage of New Relic for each case. Next time, I'll introduce Key Transactions, X-Ray, which is a useful feature of New Relic that I recently learned.

Recommended Posts

Detailed explanation Monitoring & performance improvement with NewRelic-Part 2
Detailed explanation Monitoring & performance improvement with NewRelic-Part 1
Detailed explanation Performance improvement with NewRelic-Part 3
Winning with Monitoring
Performance improvement efforts