[LINUX] Defense Techniques When You Have to Fight the Performance of Unfamiliar Applications (Part 2)

Situation of fighting the performance of an unfamiliar application (repost)

I didn't design or develop it, and I could barely divide the whole thing by participating in the review, but I don't have time to look at the source from now on, and the person who developed it is not aware of the performance aspect. ** How did you deal with the dark situation where you suddenly blew fire in terms of performance and wanted to do something in a couple of hours after being summoned **? Let's write it in% **.

The first part was an introduction to tools (OS). Defense Techniques When You Have to Fight the Performance of Unfamiliar Applications (Part 1) https://qiita.com/nfujita55a/items/3760cd099ca890f5a4d4

** This second part is an introduction to tools (Java) **, and the second part is an introduction to tools (PostgreSQL). I wrote ** what happened after the process that was identified in the first part ** was identified as Java **.

…… I think there are some strange things, so I'm looking forward to pointing out.

Assumed environment (repost)

The work pattern and the assumed environment are very legacy. It can be in the cloud (AWS EC2) or on-premise

[Web server Linux Apache]
|
[AP server Linux Java+Tomcat+WEB application, Java+Batch program]
|
[DB server Linux PostgrteSQL]

It is assumed that the configuration is.

1. Examine the Java process

Thread dump

** If you are having performance issues with your Java app, get this first. ** Since the status and stack trace of all threads are output to the Java process, how each thread is called and what is being executed at the moment of acquisition is output.

Such output is obtained for all threads in the process. This is an example output from a thread called HogeWorker. The state of the thread is written as "State: RUNNABLE".

"HogeWorer" daemon prio=10 tid=0x00007f5c2706d900 nid=0xdfb runnable [0x00007f5c3d760000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
    at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:71)
    at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:282)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1741)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
    - locked <0x0000000529000980> (a org.postgresql.core.v3.QueryExecutorImpl)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:304)
    at com.example.a.b.c.DddEee.foo(DddEee.java:234)
    at com.example.a.b.c.DddEee.bar(DddEee.java:345)
    (Abbreviation com.The following 20 lines are lined up in a row)
    at com.example.h.i.j.KkkLll.foo(KkkLll.java:123)
    at com.example.h.i.j.KkkLll.baz(KkkLll.java:12)
    at java.lang.Thread.run(Thread.java:745)

Acquisition method

** Oh, I don't hesitate to get it when it's overloaded or the processing is slow (important). ** Acquire multiple times (3 to 5 times) with an interval of 15 seconds. Take it multiple times to see if they all show the same tendency.

If the thread dump cannot be obtained by the following method (it is not recorded or the command is not returned), there are many cases where the Java process itself is close to OutOfMemoryError and nothing can be done. In such a case, I think it is better to stop the Java process and then recover it. (In many cases, it can only be stopped by # kill -KILL process ID)

** 1. kill -3 process ID **

This is easy if you know the standard main redirect file for the Java process (typical example: Tomcat's catalina.out). However, do not mistake ** -3 **. If you forget this or do -HUP or -TERM or -KILL, the Java process will stop.

# ps auxww | grep java
For the PID of the Java process identified in
# kill -3 Process ID

Get as. If any user of the process runs as the root user, a thread dump will be added to the standard main redirect file of the Java process.

** 2. jstack command **

Some people may feel that the standard output of this batch is somewhere, or that the kill command is a bit ..., so in that case you can also get it with the jstack command. (I feel this is legitimate)

# ps auxww | grep java
Look at the PID and Java path of the Java process identified in, and the execution user of that process.
# su -User running the process
Switch with
$ /path/to/java_home/bin/jps
And check if the PID you were looking at with the ps command is displayed
(In Tomcat, 12345 BootStrap appears, so confirm that 12345 is the process you want to see.)
$ /path/to/java_home/bin/jstack process id

The stack trace is output as a command execution result.

If the Java execution user cannot switch to nobody (running with nobody is better for security than running as root or a user with a shell)

# ps auxww | grep java
Look at the PID and Java path of the Java process identified in, and the execution user of that process.
# su - -m Execution user of the process-c '/path/to/java_home/bin/jps'
And check if the PID you were looking at with the ps command is displayed
# su - -m Execution user of the process-c '/path/to/java_home/bin/jstack process id'

There is also a way to output the stack trace to the command execution result. Since the terminal is filled with the stack trace, it may be better to redirect such as ">" ">>" and write it in the file.

** 3. (Extra edition) Tomcat running as a service on Windows **

Tomcat running on Windows services

You can't get a thread dump in either case, so you can easily get one by pressing Thread Dump from the Tomcat icon in the system tray.

apache2.png

Then, it will be output to the Tomcat installation destination \ logs \ tomcatX-stdout today's date.log.

View

When looking at thread dumps (especially if you're not familiar with the target system but need to look it up)

I will see it in two stages.

** 1. (Just in case) Is there enough memory in the JVM? **

At the end of the retrieved thread dump, there is output about such Java heap memory.

Heap
 PSYoungGen      total 94208K, used 25568K [0x000000094aa90000, 0x000000094da80000, 0x0000000800000000)
  eden space 90112K, 30% used [0x000000094aa90000,0x000000094b800310,0x000000094d680000)
  from space 4096K, 46% used [0x000000094d880000,0x000000094d970000,0x000000094da80000)
  to   space 4096K, 0% used [0x000000094d680000,0x000000094d680000,0x000000094d880000)
 ParOldGen       total 732160K, used 587892K [0x0000000680000000, 0x0000000696580000, 0x000000094aa90000)
  object space 732160K, 80% used [0x0000000680000000,0x0000000691f0ea58,0x0000000696580000)
 Metaspace       used 2425K, capacity 4498K, committed 4864K, reserved 1056768K
  class space    used 262K, capacity 386K, committed 512K, reserved 1048576K

For typical web applications and batch processing, Java's OutOfMemoryError is a shortage of Old space. "Maximum allocation of heap memory by specifying -Xmx of Java process x -XX: NewRatio ÷ (-XX: NewRatio + 1)" is the maximum value of the Old area. (Example: If -Xmx is 4G and -XX: NewRatio is the default 2 of ServerVM, 4GB x 2/3 = 2.66GB)

** Check if ParOldGen> used (used 587892K above) is approaching the maximum value of this Old area. If it's close and the CPU is busy with vmstat, this Java is close to his OutOfMemoryError and determines that he hasn't turned the GC process and devoted his power to the original process **. In such a case, I think it is better to stop the Java process once (add memory to the OS in some cases), increase the maximum Java heap value Xmx, and re-execute. It's awkward to handle half-finished processing situations, but ** it just wastes time waiting for it to improve. ** **

If there seems to be no problem with the memory heap, proceed to investigate the thread.

** 2. Survey of target threads **

In the case of WEB application such as Tomcat (this category because Tomcat is also included in Spring Boot), many threads are output. First of all

  1. If the thread status is WAITING or TIMED_WAITING
  2. The thread status is RUNNABLE, but it is actually waiting for the Tomcat Worker.

Can be ignored. This is a typical thing to ignore.

"http-bio-8080-exec-19" daemon prio=10 tid=0x00008d13e0003800 nid=0x4894 waiting on condition [0x00008d13fc2b7000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x00000009217dee78> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
	at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:104)
	at org.apache.tomcat.util.threads.TaskQueue.take(TaskQueue.java:32)
	at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

"ajp-bio-8009-exec-123" daemon prio=10 tid=0x00008d13ec06d000 nid=0x3dfc runnable [0x00008d13d5ad5000]
   java.lang.Thread.State: RUNNABLE
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.read(SocketInputStream.java:152)
	at java.net.SocketInputStream.read(SocketInputStream.java:122)
	at org.apache.coyote.ajp.AjpProcessor.read(AjpProcessor.java:312)
	at org.apache.coyote.ajp.AjpProcessor.readMessage(AjpProcessor.java:367)
	at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:118)
	at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:620)
	at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:316)
	- locked <0x000000092f13a8c0> (a org.apache.tomcat.util.net.SocketWrapper)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
	at java.lang.Thread.run(Thread.java:745)

What to look for in order to investigate

(I think you'll understand it as soon as you get used to it ...). The following example, which is the stack trace given earlier, is subject to investigation because it satisfies three conditions (com.example is illustrated as a project-specific package name).

"HogeWorer" daemon prio=10 tid=0x00007f5c2706d900 nid=0xdfb runnable [0x00007f5c3d760000]
   java.lang.Thread.State: RUNNABLE
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:152)
    at java.net.SocketInputStream.read(SocketInputStream.java:122)
    at org.postgresql.core.VisibleBufferedInputStream.readMore(VisibleBufferedInputStream.java:143)
    at org.postgresql.core.VisibleBufferedInputStream.ensureBytes(VisibleBufferedInputStream.java:112)
    at org.postgresql.core.VisibleBufferedInputStream.read(VisibleBufferedInputStream.java:71)
    at org.postgresql.core.PGStream.ReceiveChar(PGStream.java:282)
    at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1741)
    at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:255)
    - locked <0x0000000529000980> (a org.postgresql.core.v3.QueryExecutorImpl)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:561)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:419)
    at org.postgresql.jdbc2.AbstractJdbc2Statement.executeQuery(AbstractJdbc2Statement.java:304)
    at com.example.a.b.c.DddEee.foo(DddEee.java:234)
    at com.example.a.b.c.DddEee.bar(DddEee.java:345)
    (Abbreviation com.The following 20 lines are lined up in a row)
    at com.example.h.i.j.KkkLll.foo(KkkLll.java:123)
    at com.example.h.i.j.KkkLll.baz(KkkLll.java:12)
    at java.lang.Thread.run(Thread.java:745)

** It's an unfamiliar system, so it's no use looking at the contents of the project-specific com.example. ** Instead, when you follow from the top, look for the project-specific com.example and see the above process from the first place (fffGgg.java:234 in the above example).

On the stack trace (* unless also used by JNI)

Since it is arranged in the order of **, the processing content is inferred from "java or javax package / general library" **. The above example is very easy to understand

In other words, specify the processing content as to whether the select ... SQL is thrown to the DB and waiting for a return. If the thread dump is filled with such threads, the investigation target will move to the DBMS. (Judge that Java process is waiting for DB)

Also, if you look closely at the stack trace above, you won't see the connection pooling library (Commns-DBCP or Tomcat's DBCP). We are also looking at what happens to TCP between AP and DB (DBMS may be up and up with the establishment of a new TCP).

"Java is waiting for the return of the DB" is a typical example that you will come across when dealing with the performance of a WEB application, but I will write two more examples to note.

Supplementary case 1) When you are stuck exclusively

If the stack trace contains the project-specific package name as shown below, the status of the ** thread is BLOCKED **, and there are many such threads, it is exclusively packed.

"ajp-bio-8009-exec-124" daemon prio=10 tid=0x00008d13ec06d000 nid=0x3dfc runnable [0x00009d13d5ad5000]
   java.lang.Thread.State: BLOCKED (on object monitor)
	at com.example.zzz.YyyXxx.foo(YyyXxx.java:345)
	- waiting to lock <0x0000000123456789> (a java.lang.Object)
        at com.example.zzz.YyyXxx.bar(YyyXxx.java:567)
(Hereafter)
    at java.lang.Thread.run(Thread.java:745)

In such a case, if you look for 0x0000000123456789 of "waiting to lock <0x0000000123456789>" in the thread dump, there is a thread that is locked <0x0000000123456789> somewhere (except for deadlock), so check that thread. .. If the thread status is RUNNABLE, investigate its processing. (This is the beginning of the traffic jam). If the status of that thread is also BLOCKED, it is waiting for exclusion in a chain, so if you check it in the same way, you will reach the thread of RUNNABLE which is the head of the traffic jam somewhere. Either you can manage the reason why the leading thread cannot gain performance, or you can modify the program so that it is not exclusive.

Supplementary case 2) Regular expression

This is a terrible pattern that reduces appetite the moment you see it, but as shown below, when the thread status is RUNNABLE and the stack trace starts with regex **, there is a performance problem with the regular expression. .. ** Regular expressions are a combination of strings that match a pattern, and a single match can take a tremendous amount of time **. I've seen that it took 3 hours to match when I put a specific string (80 characters) in a regular expression (about 20 characters) that seems to be nothing. Be especially careful when putting web requests into regular expressions. It's a situation close to ReDos.

At the same time, if there is no other CPU-consuming process in the stack trace and the us is heavily loaded in the vmstat of the AP server that acquired the stack trace, the cause of the problem is certain. This is the area where the program has to be refurbished. (Although there is an idea to cut a request that would be terrible with WAF etc.)

"ajp-bio-8009-exec-123" daemon prio=10 tid=0x00008d13ec06d000 nid=0x3dfc runnable [0x00008d13d5ad5000]
   java.lang.Thread.State: RUNNABLE
    at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
    at java.util.regex.Pattern$Curly.match1(Pattern.java:4287)
    at java.util.regex.Pattern$Curly.match(Pattern.java:4236)
    at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
    (Abbreviation)
    at com.example.zzz.YyyXxx.foo(YyyXxx.java:345)
    (Abbreviation)

2. Tomcat access log

(Everyone loves it) I'll write two settings that are easy to use for unfamiliar systems running on Tomcat and are effective when dealing with performance issues. ** Wake up in the middle of the night and it's hard to set this up, so everyone should write it down. ** (Remarks without knowing the destination)

Add an item to the access log

The access log setting of Tomcat is like this by default

xml:Tomcat_8.5.61_server.xml


    <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
          prefix="localhost_access_log" suffix=".txt"
          pattern="%h %l %u %t &quot;%r&quot; %s %b" />

This is the output.

192.168.0.1 - - [19/Dec/2020:22:01:02 +0900] "GET /foo/1 HTTP/1.1" 200 12345
192.168.0.1 - - [19/Dec/2020:22:01:05 +0900] "GET /bar/2 HTTP/1.1" 200 23456
192.168.0.1 - - [19/Dec/2020:22:01:07 +0900] "GET /baz/3 HTTP/1.1" 200 34567
192.168.0.1 - - [19/Dec/2020:22:01:14 +0900] "GET /foo/4 HTTP/1.1" 200 11234
192.168.0.1 - - [19/Dec/2020:22:02:09 +0900] "GET /bar/5 HTTP/1.1" 200 22345
192.168.0.1 - - [19/Dec/2020:22:02:12 +0900] "GET /baz/6 HTTP/1.1" 200 33456

I want to add ** "session ID", "processing time (milliseconds)", and "thread name" **, so I'll do this.

xml:Tomcat_8.5.61_server.xml(After change)


    <Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
          prefix="localhost_access_log" suffix=".txt"
          pattern="%h %l %u %t &quot;%r&quot; %s %b %S %D %I" />

This

192.168.0.1 - - [19/Dec/2020:22:01:02 +0900] "GET /foo/1 HTTP/1.1" 200 12345 0123456789ABCDEF0123456789ABCDEF 254 ajp-bio-8009-exec-123
192.168.0.1 - - [19/Dec/2020:22:01:05 +0900] "GET /foo/1 HTTP/1.1" 200 23456 ABCDEF0123456789ABCDEF0123456789 10245 ajp-bio-8009-exec-112
192.168.0.1 - - [19/Dec/2020:22:01:07 +0900] "GET /baz/3 HTTP/1.1" 200 34567 0123456789ABCDEF0123456789ABCDEF 547 ajp-bio-8009-exec-157
192.168.0.1 - - [19/Dec/2020:22:01:14 +0900] "GET /foo/4 HTTP/1.1" 200 11234 0123456789ABCDEF0123456789ABCDEF 286 ajp-bio-8009-exec-114
192.168.0.1 - - [19/Dec/2020:22:02:09 +0900] "GET /baz/3 HTTP/1.1" 200 22345 ABCDEF0123456789ABCDEF0123456789 34578 ajp-bio-8009-exec-118
192.168.0.1 - - [19/Dec/2020:22:02:12 +0900] "GET /baz/6 HTTP/1.1" 200 33456 0123456789ABCDEF0123456789ABCDEF 125 ajp-bio-8009-exec-129

It changes to.

Then, when you look at the access log

  1. Note the second ** response from the right that takes the largest number of milliseconds ** (for example, lines 2 and 5). ** Is there a problem with a specific URL **?
  2. You can find out what kind of URL transition the session that is taking a long time to respond by searching with ** session ID of the request that took the response ** (excluding changing the session ID at login due to security etc.) → If the response is large in the transition starting from a specific route, that route may be temporarily blocked ...
  3. If there is a timing to match the thread dump, ** you can find the URL corresponding to the thread that took a long time in the thread dump **. (For example, the fifth request from the top is processed between 22:01:35 and 22:02:09. If there is a thread dump at 22:02:00, ① processing contents from the thread dump, ② You can see the access history before and after the request URL and ③ session ID)

Such actions are possible.

Set StuckThreadDetectionValve

That doesn't mean you can always hit a thread dump, so if you run into problems ** set StuckThreadDetectionValve to that Tomcat ** (it's a personal recommendation to have it in before you run into problems). ).

StuckThreadDetectionValve is a great thing from Tomcat 7.0.13 that ** automatically outputs a stack trace of requests that are taking a long time to process **.

In server.xml, change it as follows (because I want to put it inside the Host element), 20 is the number of seconds of the threshold to consider "processing is taking a long time".

However, every 10 seconds, Tomcat checks if "processing is taking a long time", so if you write 20, a stack of requests that takes a minimum of 20 seconds and a maximum of 30 seconds or more. The trace is output, and becomes.

Reference) Detecting request processing delay with Tomcat https://qiita.com/tamura__246/items/dae31a6a7e24c756b756

xml:Tomcat_8.5.61_server.xml


      <Host name="localhost"  appBase="webapps"
            unpackWARs="true" autoDeploy="true">
          <!--Abbreviation-->
      </Host>

Change like this

xml:Tomcat_8.5.61_server.xml(After change)


      <Host name="localhost"  appBase="webapps"
            unpackWARs="true" autoDeploy="true">
          <!--Abbreviation-->
          <Valve className="org.apache.catalina.valves.StuckThreadDetectionValve" threshold="20" />
      </Host>

Then, the stack trace is output with the thread name. Because it is the WARN level of Tomcat

It will be output around. From this stack trace, we will investigate what the problem handling is doing.


In the second part, we will finally move on to PostgreSQL.

Recommended Posts

Defense Techniques When You Have to Fight the Performance of Unfamiliar Applications (Part 2)
A memo when you want to clear the time part of the calendar
When you want to change the MySQL password of docker-compose
When you want to use the method outside
[rails] After option useful when you want to change the order of DB columns
The part I was addicted to in "Introduction to Ajax in Java Web Applications" of NetBeans
When the hover of Eclipse is hard to see
Gemfile.lock doesn't say the version of gem you have installed
I want to expand the clickable part of the link_to method
Have you ever thought about the definition of "bad design"?
[PostgreSQL] The story that you have to cut the session properly
[Swift] When you want to know if the number of characters in a String matches a certain number ...
Comparison of version strings (Java implementation) when you want to branch the process between two versions
When you want to check whether the contents of a property can be converted to a specific type