The story that standard output also fatally changes the behavior of the program

I think everyone has the experience of debugging by adding standard output processing (such as ʻecho and System.out.print () `) to scripts and programs that do not work properly. I think that the addition of standard output processing is used in various situations, from simple program operation verification to investigation in an environment where a debugger cannot be used due to restrictions.

However, adding ** 1 line of standard output processing can also fatally change the behavior of the program **.

In the past, the application responded as a result of adding a line of System.out.println (...); to the Java program as a workaround for the problem that the batch stopped due to ** ʻecho`. There were times when I couldn't return **. I would like to explain the memory at that time with Sample Program.

The annual batch has stopped!

About 10 years ago, one day, I received an email from me, who was in charge of maintenance of a certain system, saying, "The annual batch has stopped, so please hurry to investigate." When I read the text further, it says that the program is stopped at the line of the ʻecho instruction in the batch file. ![s1.png](https://qiita-image-store.s3.amazonaws.com/0/43869/914c319b-6923-1178-02d9-55180da8a20d.png) When I first heard it, I doubted the reporter's misunderstanding, but it is true that a batch file containing 4 consecutive lines of ʻecho (Sample here There is k-tamura / stdout-problems / blob / master / src / main / resources / annual.bat)),

・ ・ ・(abridgement)・ ・ ・
echo %date% %time% ==================================================>> logs/result.log
echo %date% %time%  [Step4]                                          >> logs/result.log
echo %date% %time%  Pre-Import Processing                            >> logs/result.log
echo %date% %time%                                                   >> logs/result.log
・ ・ ・(abridgement)・ ・ ・

Looking at the log it output (result.log), I only saw two lines of ʻecho` messages.

・ ・ ・(abridgement)・ ・ ・
2007/09/28 12:04:44.72 ==================================================
2007/09/28 12:04:44.72  [Step4]                                          

(No log has been output since then)

I thought, "That's not possible", but just in case, I decided to build a simple test environment and execute the batch file. The result is as expected. The event was not reproduced and the process was completed normally. I had no idea other than killing the batch process from the outside.

After that, I was able to contact the reporter by phone, so when I heard the details, it was said that this batch was called from a Java program.

s2.png

The Java program I got from the reporter included the following main () method and started the batch file with java.lang.ProcessBuilder ([Sample here](https: //). github.com/k-tamura/stdout-problems/blob/master/src/main/java/org/t246osslab/stdoutproblems/batch/BatchImpl.java)).

import java.io.*;
import org.springframework.core.io.*;

public class BatchImpl implements BatchInterface {

    public static void main(String[] args) {
        System.out.println("Result: " + new BatchImpl().executeBatch());
    }

    public int executeBatch() {
        try {
            Resource resource = new ClassPathResource("annual.bat");
            File batFile = resource.getFile();
            ProcessBuilder pb = new ProcessBuilder(batFile.getAbsolutePath());
            Process process = pb.start();
            return 0;
        } catch (IOException e) {
            e.printStackTrace();
        }
        return -1;
    }
}
* The source code is much simpler than the actual one (at that time), omitting exception handling that has nothing to do with the issue of this story. ..

Immediately, when I actually executed the Java program, the phenomenon was unexpectedly reproduced. Certainly, I didn't see the "Pre-Import Processing" message after the " [Step 4] `" message.

So, if I remove ʻecho, will it work? I've tried it and it works fine. Apparently, if there are more than a certain number of ** ʻechos, an event will occur **.

Find out

After confirming that the event can be reproduced, the investigation is started. After a lot of research on the net, I found one cause that seems to be likely. That is, ** If the output stream of the process (java.lang.Process) started by the Java program is not read, the output buffer held internally will be full and the process will stop **. That is.

process_stop.png

So, I added a line of code that reads the output stream with Process.getInputStream () and outputs it to the standard output with System.out.println ().

public int executeBatch() {

    //・ ・ ・(abridgement)・ ・ ・

    Process process = pb.start();
    System.out.println(IOUtils.toString(process.getInputStream())); //Add this one line
    return 0;

    //・ ・ ・(abridgement)・ ・ ・
}

Then the batch started working fine without stopping. The problem that was directly caused by the large number of batch standard output processing (ʻecho) could be solved by adding Java standard output processing (System.out.println ()`). Will be.

* As a supplement, you can output to the log without outputting to the standard output with `System.out.println ()`, and further In addition, it ignores the return value, but you only need to add ʻIOUtils.toString (process.getInputStream ());`.
Regarding the problem that the output buffer is full, I think that [this page](https://www.jpcert.or.jp/java-rules/fio07-j.html) is easy to understand.

Now that we know the cause, we need to consider how to deal with it, but we decided to leave it to the "great people" who are meeting on this matter and go home. I thought, "This solved this problem."

No annual batch response!

When I went to work the next day, there was a report that "the response of the annual batch did not return" which should have been fixed yesterday. s3.png Isn't the method of correction different? I thought, but it wasn't different from what I expected.

And I found another new fact. That is, ** the Java program and annual batch that I fixed yesterday are called from a web application on another host via RMI as follows **. seq1.png

* RMI (Java Remote Method Invocation): For Java programs to communicate with each other over a network and call methods of objects (remote objects) on different computers. Technology.

The web application has a screen similar to the one that calls this batch, and when you click the Run Batch button, it uses RMI to run the remote batch asynchronously. The web application then displays the message "Batch executed" to the user without waiting for the batch to complete.

Screenshot from 2019-01-28 13-12-19.png

The web application then sends a periodic completion confirmation request, and after a few hours the batch completes, the RMI server returns a completion response with the message "Batch completed".

Screenshot from 2019-01-28 13-12-1987.png

However, with this fix, the message "Batch executed" was not output immediately after clicking the "Batch execution" button, and no response was returned. wait1.gif (That is, wait for a response like this in the browser became) Given this new fact and the modifications we made yesterday, it didn't take long to figure out the cause.

Before adding one line of System.out.println (・ ・ ・);, the Java program (= Java module called from the RMI server) returned a response immediately, but after adding it, The Java program was supposed to return a response after waiting for the batch to complete. The reason is that ** Java programs now execute in synchronization with the batch by reading the standard output of the batch from time to time **.

seq2.png

This time, take measures that will not cause any problems

What to do then?

Since the output amount of ʻecho in the batch file is large, reducing ʻecho is also considered as a tentative plan. This is also one of the correct choices, as it is an urgent issue to address, but not logging is not very good. In addition, I would like to take measures that take into consideration the future if possible with the minimum correction considering degradation (someone may add ʻecho` for debugging purposes etc. Since there is no). s4.png When I thought about how to do something good, the conclusion was simple. ** Add the following line to the beginning of the batch file **.

@echo off

In other words, I should have prevented the batch from outputting to standard output.

As you can see by executing the batch file at the command prompt, the information (= standard output) displayed at the command prompt after execution is ** only the executed command ** as shown below.

C:\batch>mkdir logs

C:\batch>echo 2007/09/28 12:04:44.72 ==================================================>> logs/result.log

C:\batch>echo 2007/09/28 12:04:44.73  [Start]                                          >> logs/result.log

C:\batch>echo 2007/09/28 12:04:44.74                                                   >> logs/result.log

C:\batch>echo 2007/09/28 12:04:44.75  Initialize                                      >>logs/result.log

・ ・ ・ (Omitted below) ・ ・ ・

All the information that the batch wants to output is redirected to the log file (>> logs / result.log), and the standard output is unnecessary information.

5.png

* For the operation of `@ echo off`, see [This page](https://www.adminweb.jp/command/bat/index2.html). I think the explanation is easy to understand.

So, I added @echo off to the beginning of the batch file, verified that there was no problem, and the countermeasure was completed for the time being. s5.png

* The implementation of the Java program presented above does not mean that there is no room for review.

There is a reflection that the work of investigating all the callers of the corrected part was omitted, but it was an event that I thought that it would be difficult to fix a bug that requires urgent action. And it was a rare experience that I used the standard output as a countermeasure for the event that the standard output became a problem, and there was another problem there as well.

Finally

It may be better to keep in mind that "a line of code that does not seem to have an adverse effect at first glance like standard output processing can fatally change the behavior of a program". Maybe.

[Supplement] Actually try

I made a Sample Program, so if you are interested, please try it. I made it based on Spring Boot based on the memory of about 10 years ago. The application server and RMI server are on the same host and are designed to communicate over localhost.

Since it is for operation check, the process of detecting the completion of the batch by the Web application is not implemented. In addition, batch processing such as DB access is not written, and batch operation is simulated by sleeping for a certain period of time.

Reference material

-Hishidama's technical memo page --Start external process

cci95logo.gif

Recommended Posts