Something went wrong Performance tuning of ** JVM ** is a systematic and complex task. This article explains the concept and shows how to use the parameters of the ** JVM ** to tune your application.

** This blog is a translation from the English version. You can check the original from here. We use some machine translation. We would appreciate it if you could point out any translation errors. ** **

Performance tuning layer

In order to improve the performance of the system, it is necessary to optimize from various viewpoints and layers. The layers to be optimized are as follows.

As shown in the figure, optimization is required in many layers other than the tuning of the JVM. Tuning the system is not limited to JVM tuning. Rather, it is necessary to tune the entire system in order to improve the performance of the system. This article only discusses JVM tuning. Other tuning aspects will be discussed later.

Before JVM tuning, assume that your project's architecture and code are tuned or that they are the best architecture and code for your current project. These two assumptions are the basis of JVM tuning, and architecture tuning has the greatest impact on system performance. You can't expect a qualitative leap from an application that has an architectural flaw or needs code optimization by performing only JVM tuning.

In addition, you should have clear performance optimization goals and understand current performance bottlenecks before you start tuning. To optimize bottlenecks, perform stress and benchmark tests on your application and use a variety of monitoring and statistical tools to see if your optimized application meets your goals. need to do it.

JVM tuning steps

The ultimate goal of tuning is to ensure that your application has higher throughput at the lowest cost of hardware consumption. JVM tuning is no exception. JVM tuning primarily optimizes the garbage collector to improve collection performance, allowing applications running on VMs to achieve higher throughput while experiencing lower latencies with less memory usage. will do so. Keep in mind that lower memory usage / lower latency does not mean better performance. It is important to make the best choice.

Performance metrics

To find and evaluate performance bottlenecks, you need to know the definition of performance metrics. For JVM tuning, you need to know the following three definitions and use these metrics as a basis for evaluation.

--Throughput: One of the important metrics. Throughput refers to the best performance a garbage collector can achieve for an application, without considering the pause time or memory consumption of garbage collection. --Latency: Latency measures how much the pause time resulting from garbage collection is reduced to avoid application vibration during the running process. --Memory usage: Refers to the amount of memory required for the garbage collector to operate smoothly. The performance improvement of one of the three attributes comes at the cost of the performance loss of the other one or two attributes. The business requirements of an application determine how important one or two attributes are to the application.

Performance tuning principles

During the tuning process, the following three principles will help you implement simpler garbage collection tuning to meet the performance requirements of your desired application.

--Minor GC Collection Principle: Every time a minor GC should collect as many garbage objects as possible in order to reduce the frequency of full GC for the application. --GC memory maximization principle: When solving throughput and latency issues, the more memory the garbage collector uses, the more efficient the garbage collection and the smoother the application. --GC Tuning "Two of Three" Principles: Instead of tuning all three attributes of throughput, latency, and memory usage, you should tune only two of the three performance attributes.

Performance tuning procedure

The figure above shows the basic JVM tuning steps for your application. You can see that JVM tuning involves continuous configuration optimization and multiple iterations based on performance test results. Each of the previous steps may experience multiple iterations before each desired system metric is met. In some cases, the previous parameters may need to be tuned many times to meet a particular metric, and all previous steps may need to be retested.

In addition, tuning typically begins with meeting the memory usage requirements of your application, followed by latency and throughput. Tuning should follow this series of steps. The order of these tuning steps cannot be reversed. The following sections detail each tuning step with examples.

Directly select the officially recommended server mode in JDK 1.6 and above to run the JVM.

Use the default parallel collector for JDK 1.6-1.8 as the garbage collector. (Use parallelGC for younger generations, parallelOldGC for older generations).

Determining memory usage

Before deciding on memory usage, there are two things you need to know.

1, application operation phase 2, JVM memory allocation

Operation phase

The operation of the application is explained in the following three phases.

--Initialization: The JVM loads the application and initializes the application's main module and data. --Stabilization: The application has been running for a long time and is undergoing stress testing. Each performance parameter is in a stable state. The core functionality has been performed and it has been warmed up using JIT compilation. --Summary: In the final summary phase, we are conducting some benchmark tests to generate the corresponding reports. No special attention should be paid to this phase. Memory usage and active data size should be determined during the application stabilization phase, not during the project launch phase. Before explaining how to determine memory usage, let's first look at the memory allocation of the JVM.

JVM memory allocation and parameters

The main JVM heap space consists of younger generations, older generations, and persistent generations. The size of the younger generation, the size of the older generation, and the size of the persistent generation make up the total heap size. How to propel a particular object is not discussed here. Now let's see how the following JVM command specifies the heap size. If the following parameters are not used to specify the heap size, the virtual machine will automatically choose the appropriate value, which may be adjusted automatically based on the system overhead.

If you are concerned about performance overhead, only FullGC can implement permanent generation sizing, so set the initial size and the maximum size of the permanent generation to the same value whenever possible.

Calculate the size of active data

To calculate the size of active data, follow these steps:

As mentioned earlier, active data size should be measured by how much data that has been active for a long time since the beginning of the application's stability phase occupies space in the Java heap.

When calculating the active data size, be sure to meet the following requirements:

--Use the default JVM parameters when running the test, rather than setting the startup parameters manually. --Make sure your application is in a stable state when Full GC 2011 occurs. The default JVM startup parameters are used to observe the memory usage required when the application is in the stable phase.

When is the application in the stable phase?

After sufficient stress, the application is in the stable phase only if it reaches a workload that meets business requirements during peak business hours in a production environment and remains stable after the peak. .. Therefore, stress testing is essential to determine if an application has reached a stable phase. How to stress test your application is beyond the scope of this article. I'll discuss this question later in another article.

After determining that your application is in a stable stage, pay attention to your application's GC log, especially the Full GC log.

GC log directive: -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:<filename>

GC logs are the best way to collect the information needed for optimization. You can find problems by enabling GC logging even in a production environment. By enabling GC logging, you can provide a wealth of data while minimizing the impact on performance.

FullGC log is required. If the FullG log is not available, use a monitoring tool to force the call or use the following command to trigger the log.

jmap -histo:live pid

If the full GC is activated during the stable period, the following information can be obtained.

From the GC log mentioned above, you can get a rough estimate of the heap usage and GC time for the entire application during a full GC. To get a more accurate estimate, collect the information multiple times and calculate the mean. Alternatively, it can be estimated using the longest FullGC.

In the figure above, 93168KB (about 93MB) of the old generation space is occupied after full GC. Consider this amount of data as active data in the old generation space.

Other heap spaces are allocated according to the following rules.

From the above rules and the FullGC information in the previous figure, the application heap space can be planned as follows.

Java heap space: 373MB = 93168KB (old generation space) x 4

Young generation space: 140MB = 93168KB (old generation space) x 1.5

Permanent generation space: 5MB = 3135KB (Permanent generation space) x 1.

old generation space: 233MB = 373MB (heap space)-140MB (Young generation space)

Corresponding application launch parameters

java -Xms373m -Xmx373m -Xmn140m -XX:PermSize=5m -XX:MaxPermSize=5m

Latency tuning

After determining the active data size for your application, you need to tune the latency. At this point, the heap memory size and latency cannot meet your application's requirements, so you need to debug your application based on your application's actual requirements.

During this phase, you may need to re-optimize the heap size configuration, evaluate the GC duration and frequency, and decide if you need to switch to another garbage collector.

System latency requirements

Before tuning, you need to know what the latency requirements of your system are and which metrics can be tuned for latency.

--Average acceptable average downtime for the application: This time is compared to the measured duration of the minor GC 2011. --Allowable Minor GC 2011 Frequency: The minor GC 2011 frequency is compared to the acceptable value. --Maximum permissible pause time: Maximum pause time: Maximum pause time is compared to the worst case FullGC duration. --Maximum permissible pause frequency: This is basically the frequency of FullGC.

Of the metrics mentioned above, pay particular attention to average downtime and maximum pause time. These two metrics are very important to the user experience.

Based on the above requirements, you need to get the following data:

--Minor GC duration --Number of minor GCs --Maximum duration of FullGC --Full GC frequency in the worst case

Optimized Young Generation size

For example, in the preceding GC log, the average duration of the minor GC is 0.069 seconds, and the minor GC occurs once every 0.389 seconds.

The average downtime is set to 50ms and the current duration (69ms) is clearly too long to be adjusted.

We have found that the larger the space in the ** young generation **, the longer and less frequent the MinorGC will last.

To reduce the duration, you need to reduce the size of the young generation space.

To reduce the frequency, you need to increase the size of the young generation space.

To minimize the impact of resizing the young generation space on other sections, when resizing the young generation space, keep the size of the old generation space if possible.

For example, if you reduce the size of the young generation space by 10%, do not change the size of the old generation space and the permanent generation space. The parameters after optimization in this step are as follows.

java -Xms359m -Xmx359m -Xmn126m -XX:PermSize=5m -XX:MaxPermSize=5m

The size of the young generation is changed from 140 MB to 126 MB; the heap size is changed accordingly; the old generation has no changes at this point.

Optimize the size of Old Generation

As with the previous step, we need to get some data from the GC log before optimization. In this step, we will focus on the duration and frequency of FullGC.

The following information can be obtained from the previous figure.

The average FullGC frequency is 1 FullGC every 5.8s.

The average FullGC duration is 0.14s.

(This is only a test. FullGC lasts longer in real projects.)

Object promotion rate

Is it possible to evaluate if there is no FullGC log? It is possible to use the promotion rate for evaluation.

For example, in the startup parameter above, the size of the old generation is 233MB.

How long it takes to occupy this 233MB of free space depends on the rate of promotion from young generation to old generation.

Promotional usage of old generation = Java heap usage after each MinorGC-Young generation usage after MinorGC

Object promotion rate = average value (promote old generation usage every time) / old generation space

With the object promotion rate, you can calculate the number of minor GCs needed to occupy space in the old generation and the approximate duration of one full GC.

Let me give you an example.

In the figure above, it is described as follows.

After the first minor GC, the usage of the old generation space is 8 KB (13740 KB - 13732 KB).

After the second minor GC, the usage of the old generation space is 4489 KB (22394 KB - 17905 KB).

After the third minor GC, the usage of the old generation space is 16822 KB (34739 KB - 17917 KB).

After the fourth minor GC, the usage of the old generation space is 30230 KB (48143 KB - 17913 KB).

After the fifth minor GC, the usage of the old generation space is 44195 KB (62112 KB - 17917 KB).

Older generation promotion usage after each minor GC

Between the second and the first minorGCs: 4481 KB

Between the third and the second minorGCs: 12333 KB

Between the fourth and the third minorGCs: 13408 KB

Between the fifth and the fourth minorGCs: 13965 KB

After the calculation, you can get the following information.

The average usage promotion for each minorGC is 12211 KB (about 12 MB).

In the preceding figure, the minorGC happens once every 213ms on average.

Promotion rate = 12211 KB/213ms = 57 KB/ms

It takes about 4.185s (233*1024/57 = 4185ms) to fully occupy 233 MB of the old generation space.

The worst full GC frequency can be estimated using the two methods described above. You can adjust the frequency of Full GC by resizing the previous generation. If the Full GC is too long to meet the application's minimum latency requirements, you need to switch the garbage collector. The next article details how to switch to a different garbage collector (for example, current mark sweep, switch to CMS, etc.). CMS tuning is a little different.

Throughput tuning

After going through the tuning steps mentioned above, we are finally in the final tuning step. In this step, you will perform a throughput test on the previous results and make fine adjustments.

Throughput tuning is primarily based on the throughput requirements of your application. The application must have a comprehensive throughput metric derived from the overall application requirements and tests. Tuning can be terminated when the application throughput reaches or exceeds the expected throughput target.

If you still cannot reach your application's throughput goals after optimization, you should review your throughput requirements and evaluate how much gap there is between your current throughput and your goals. If the gap is around 20%, you can change the parameters to increase memory and debug the application again. If the gap is too large, you should consider whether your design and throughput goals match and reassess your throughput goals from an application-wide perspective.

For garbage collectors, the goal of throughput tuning is to reduce or avoid the occurrence of full GC and Stop-The-World CMS. Both of the two garbage collector methods can lead to reduced application throughput. Try to recycle as many objects as possible during the MinorGC phase to prevent them from being rapidly promoted to older generations.

Conclusion

Plumbr investigated the usage of specific garbage collectors based on 84,936 cases. Of the cases where the garbage collector is explicitly specified, the concurrent mark sweep (CMS) collector is the most frequently used in 13% of cases. However, in most of these cases, the optimal garbage collector has not been selected. This majority case accounts for about 87%.

Tuning the JVM is a systematic and complex task. At the moment, auto-tuning under the JVM is very good, and setting basic initial parameters can ensure stable operation for common applications. Some teams may not prioritize application performance. In this case, the default garbage collector is usually sufficient to meet the desired requirements. Tuning should be based on your own situation.

Alibaba Cloud is the No. 1 (2019 Gartner) cloud infrastructure operator in the Asia Pacific region with two data centers in Japan and more than 60 availability zones in the world. Click here for more information on Alibaba Cloud. Alibaba Cloud Japan Official Page *

JVM Performance Tuning: What is Tuning and How to Make a Good Plan