Follow a mysterious event where the log level suddenly changes-A day of an OSS support engineer

Introduction

** There is a problem with the following Java code. Do you know what the problem is? ** **

    Logger.getLogger("test").setLevel(Level.OFF);

This is the code intended to suppress the log output of the logger named test.

The Logger used here is the Java standard java.util.logging.Logger, and getLogger () returns a logger instance with the name given in the argument if it has already been created. Is a newly generated method. setLevel () sets the log level for that logger. Since the argument is Level.OFF, it disables log output.

This article is a true story about the problem caused by the above code.

Problems caused by this code

One day, an urgent investigation request came to me. The content was that "the restart of Tomcat caused the web application to return a normal response without a body". q1.png Although the event did not reproduce with the subsequent restart, I would like you to investigate the cause in preparation for the future.

Find out

So, the investigation started immediately. If the reported event is true, you should see a line similar to the following in the Tomcat access log:

192.169.1.8 ...(Abbreviation)・ ・ ・ HTTP/1.1" 200 - 379 PostmanRuntime/7.1.3

The "line like this" means that after the HTTP version ("HTTP / 1.1" in this example), "200" indicates the status code of the normal response and the size of the response body is 0. This is the line followed by "-" [^ 1].

When I grep the access log under this condition, there were certainly many applicable lines. All of them are only access logs to a specific URL between the restart of Tomcat where the occurrence of the event was confirmed and the next restart. q2.png Upon further investigation, it was found that a NullPointerException also occurred at the same time as the event occurred. The place where NullPointerException occurred was the following method of class LogFilter of Restlet [^ 2] used internally by the Web application. As you can see, it's just a method that outputs ʻINFO` level logs.

java:org.restlet.engine.log.LogFilter


protected void afterHandle(Request request, Response response) {

	if ((request.isLoggable()) && (this.logLogger.isLoggable(Level.INFO))) {
		//★★ NPE was occurring on the next line ★★
		long startTime = ((Long)request.getAttributes().get("org.restlet.startTime")).longValue();
		int duration = (int)(System.currentTimeMillis() - startTime);
		this.logLogger.log(Level.INFO, this.logService.getResponseLogMessage(response, duration));
	}
}

As a result of the investigation, it was found that the occurrence of NullPointerException and the resulting normal response without a body are caused by the Restlet bug [^ 3]. We also found that the above method is always called when accessing the URL where the event occurred. q4.png However, one thing I didn't understand remained.

The mystery left

That is, ** the code in the above if statement is effectively dead code in a web application **. In other words, it cannot be included in this branch condition.

This is because the source code of the Web application has been implemented so that the conditional expression this.logLogger.isLoggable (Level.INFO)) is always false. In Web applications, it has long been a problem that a large amount of Restlet logs are output to catalina.out (Tomcat log), and measures have been taken to disable the output.

The countermeasures were as follows.

ServiceEndpointApplication (the class that is the starting point for the Restlet call of the web application)


     final static String RESTLET_LOGGER_NAME = "org.restlet.Component.LogService";

     static {
         Logger logger = Logger.getLogger(RESTLET_LOGGER_NAME);
         logger.setLevel(Level.OFF);
     }

The same log output invalidation process as described at the beginning is implemented in the static initializer of the class ServiceEndpointApplication of the Web application.

Restlet's LogFilter holds a logger instance for Restlet in the field, so in ServiceEndpointApplication which is always called just before this class is called, set the log level to ʻOFF` and suppress log output. The intention of this measure is to do it. q3.png In the Web application, I only used Restlet in a limited way, so I think it was a judgment that there is no problem even if the Restlet log is not output.

With this countermeasure, the log level is ʻOFF, so it should not enter the branch of the if statement earlier, but in reality it entered the branch and a NullPointerException` occurred. Even more strangely, this has never happened, even though I've restarted Tomcat many times in the past. And, by restarting Tomcat after the event occurred, the event did not reproduce at all.

q7.png

Why the log level has changed

The first thing to suspect is that some files were mixed in that changed the log level on reboot. For example, it is possible that the war file of an older version of the web application has been deployed, or that a property file has been added to change the log level.

However, when I checked with the person in charge, the answer was that the restart process was automated and very unlikely. Certainly, considering that the event did not reproduce after restarting after the event occurred, it seems unlikely that some file was mixed in (if the file is not restored, the event will occur after that). Because I will continue).

In any case, it is difficult to investigate if the event cannot be reproduced, so I decided to check if it could be reproduced in the local environment. Since I was able to identify the request (URL) where the event occurs, I sent it to the Web application and checked if it could be reproduced, but it did not reproduce at all. Even in the environment where the event occurred, the incidence rate is such that it occurred only after restarting one day, so it should not be easily reproduced, and it is thought that there are some conditions.

And after trial and error, I was finally able to reproduce the event. When I used JMeter to send a large number of requests in multiple threads immediately after restarting, it was reproduced with a probability of about once every five times. In other words, there is something wrong with the above-mentioned Web application countermeasures.

Malfunction in multithreading?

I suspect a multithreaded malfunction, but before that I need to know why the logs are not output. There are two first possible reasons for this.

  1. Somewhere after this static initializer, setLevel () is called
  2. In the first place, setLevel () is not called

First, I grep the source code of the web application and Restlet with the keyword "setLevel" to verify that the first reason is correct. However, I couldn't find any code that could cause it. It's unlikely that you're calling setLevel () in a web application or source code other than Restlet, so this doesn't seem to be the reason.

Next, in order to verify that the second reason is correct, I decided to add a System.out.println () statement to this static initializer and check the operation.

In other words, this is what it is.

ServiceEndpointApplication (the class that is the starting point for the Restlet call of the web application)


     final static String RESTLET_LOGGER_NAME = "org.restlet.Component.LogService";

     static {
         System.out.println("ServiceEndpointApplication#clinit start");
         Logger logger = Logger.getLogger(RESTLET_LOGGER_NAME);
         logger.setLevel(Level.OFF);
         System.out.println("ServiceEndpointApplication#clinit end");
     }

This will tell you if this class is not loaded or if there are no exceptions (errors) in the static initializer for multithreading.

However, the result of the output when the event was reproduced was as follows.

ServiceEndpointApplication#clinit start
ServiceEndpointApplication#clinit end

In other words, it is certain that it is set to Level.OFF. Neither the first reason nor the second reason seems to be correct.

Running in a different class loader?

If so, Restlet may not be using a logger instance (with invalid log levels) pre-generated by your web application, for example because it was initialized by a different class loader.

So, I made the following corrections and tried to confirm the reproduction again. The name and ID of the logger instance and the loaded class loader information are output in the place where the logger instance is first generated and the place where it is acquired and output to the log.

ServiceEndpointApplication (the class that is the starting point for the Restlet call of the web application)


     final static String RESTLET_LOGGER_NAME = "org.restlet.Component.LogService";

     static {
        Logger logger = Logger.getLogger(RESTLET_LOGGER_NAME);
        logger.setLevel(Level.OFF);
		System.out.println("ServiceEndpointApplication#clinit");
		System.out.println("-------------------------------------");
		System.out.println("Logger Name: " + logger.getName());
		System.out.println("Logger Instance ID: " + logger);
		System.out.println("Class Loader: " + Thread.currentThread().getContextClassLoader());
     }

java:org.restlet.engine.log.LogFilter


protected void afterHandle(Request request, Response response) {

	System.out.println("LogFilter#afterHandle");
	System.out.println("-------------------------------------");
	System.out.println("Logger Name: " + logger.getName());
	System.out.println("Logger Instance ID: " + logger);
	System.out.println("Class Loader: " + Thread.currentThread().getContextClassLoader());
	if ((request.isLoggable()) && (this.logLogger.isLoggable(Level.INFO))) {
		long startTime = ((Long)request.getAttributes().get("org.restlet.startTime")).longValue();
		int duration = (int)(System.currentTimeMillis() - startTime);
		this.logLogger.log(Level.INFO, this.logService.getResponseLogMessage(response, duration));
	}
}

The output looks like this:

ServiceEndpointApplication#clinit
-------------------------------------
Logger Name: org.restlet.Component.LogService
Logger Instance ID: java.util.logging.Logger@1051f7a6
Class Loader: WebappClassLoader
  context: /Webapp1
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
org.apache.catalina.loader.StandardClassLoader@2d3b52e

LogFilter#afterHandle: 
-------------------------------------
Logger Name: org.restlet.Component.LogService
Logger Instance ID: java.util.logging.Logger@b5270f5
Class Loader: WebappClassLoader
  context: /Webapp1
  delegate: false
  repositories:
    /WEB-INF/classes/
----------> Parent Classloader:
org.apache.catalina.loader.StandardClassLoader@2d3b52e

The result shows that the logger class is loaded from the same class loader but is a different instance. This may mean that the once created logger instance has been destroyed and newly created.

Other possibilities?

So, instead of the beginning, Javadoc of java.util.logging.Logger.getLogger () I decided to check .html # getLogger-java.lang.String-). And I found the following description.

Note: LogManager can only hold weak references to newly created loggers. It is important to understand that a logger previously created with the specified name can be garbage collected at any time in the absence of a strong reference to the logger. In particular, this is two consecutive calls, such as getLogger ("MyLogger"). Log (...), and "MyLogger" if there is no strong reference to a logger named "MyLogger" anywhere in the program. This means that another logger object named may be used.

In other words, if GC occurs in a short time (even 1ms) between the call of the ServiceEndpointApplication staic initializer and the call ofLogFilter # afterHandle (), thenLogger.getLogger ()Can be another instance, and the log level you should have set will be the default.

q6.png

If GC does not occur within this short time, the logger instance will be retained (= strongly referenced) in the Restlet class after that, so it will not be subject to GC forever (unless it is restarted). .. That's why the restarts that I had done so many times caused the event only once.

q7.png

Check the operation

If you try to reproduce this operation with a simple program, it will be as follows.

	public static void main(String[] args) {
		Logger.getLogger("test").setLevel(Level.OFF);
		//If GC occurs here,
		System.gc();
		//After that, getLevel()Returns null.
		System.out.println(Logger.getLogger("test").getLevel());
	}

The result will be null. If you comment out System.gc (); and no GC occurs during that time, ʻOFF` will be output.

I often see code that sets a logger to a field variable like this, for a reason (prevents GC from losing the attributes you set in your instance):

    public static Logger logger = Logger.getLogger("xxx.xxx");

So let's modify the source code. Would it be as follows if only the minimum correction was made?

ServiceEndpointApplication (the class that is the starting point for the Restlet call of the web application)


     final static String RESTLET_LOGGER_NAME = "org.restlet.Component.LogService";

     static Logger;
     static {
         logger = Logger.getLogger(RESTLET_LOGGER_NAME);
         logger.setLevel(Level.OFF);
     }

Actually, after applying this fix, I tried to check the operation 20 times with the reproduction procedure I found, but this event did not occur. Since the recall rate was about once every 5 times, it can be said that there is no problem if it does not reproduce even if it is done 20 times ($ \ scriptsize {(1 --0.8 ^ {20}) * 100 ≒ 99 \ %} $ I'm sure).

Finally

I will answer the first question again.

    Logger.getLogger("test").setLevel(Level.OFF);

This code does not guarantee to suppress the log output of the logger named test. In other words, the chances of the next ʻinfo ()method printingtest!` are non-zero.

    Logger.getLogger("test").setLevel(Level.OFF);
    //If GC occurs here, Logger.getLogger()Returns another instance
    Logger.getLogger("test").info("test!");

[^ 1]: To be exact, it depends on the output format of the access log of Tomcat. In this environment, it was the default output format. [^ 2]: Restlet is an OSS framework for building RESTfull web applications. [^ 3]: This commit has been fixed. How to fix it is not good, but ...

Recommended Posts

Follow a mysterious event where the log level suddenly changes-A day of an OSS support engineer
Pursuing the mystery that the number of DB connections in Tomcat increases to only 8-A day of an OSS support engineer
About the log level of java.util.logging.Logger
28th day of an engineer who will become a full-fledged person in 100 days