Introduction

A small story.

Keyword monitoring of logs that are completed in one line, such as apache logs, can be achieved relatively easily. For example, the apache error log is complete with one line and is monitored by keywords such as [error] and [alert] [crit].

[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

When you get a stack trace in Java, it will be multiple lines. This should be monitored with the keyword ERROR.

2017-2-01 13:33:21 [main] ERROR Unexpected Error
java.lang.NumberFormatException: For input string: "1.1"
        at java.lang.NumberFormatException.forInputString(Unknown Source)
        at java.lang.Integer.parseInt(Unknown Source)
        at java.lang.Integer.<init>(Unknown Source)
        at ExceptionPrintDemo.formatInt(ExceptionPrintDemo.java:7)
        at ExceptionPrintDemo.main(ExceptionPrintDemo.java:14)

However, when monitoring the log with the monitoring tool, only one line can be recognized, so the following stack trace cannot be notified by email. ** As a result of being notified by e-mail of the message "2017-2-01 13:33:21 [main] ERROR Unexpected Error", "Oh ..." and see the log every time on the production machine. Did you go? ** **

In addition to **, you may want to monitor not only the keywords on the first line but also the keywords on the second and subsequent lines. ** ** Also, ** on older systems, I sometimes see dates on the first line, error codes on the second line, and error messages on the third line **, which makes keyword monitoring cry.

20170201 11:12:34
ERR001
○○ The code is invalid.
20170201 11:12:34
ERR002
Unable to connect to database

This time I tried to solve that area using fluentd. The monitoring tool uses mackerel, but I think it is the same for other monitoring tools.

solution

Use fluentd to convert to a single line of json and monitor.

Setup steps

Install fluentd

Install in one line. If you touch it after a long time, installation will be extremely easy ...

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

fluentd settings

This time, I am using fluentd's tail input plugin.
The tail input plugin supports multiple lines, and the previous log can be rewritten like this.

`/etc/td-agent/td-agent.conf`


<source>
  type tail
  path /var/log/app/app.log
  pos_file /var/log/td-agent/app/app-log.pos
  tag app.log
  multiline_flush_interval 10s
  format multiline
  format_firstline /\d{4}-\d{1,2}-\d{1,2}/
  format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\](?<level>[^\s]+)(?<message>.*)/
</source>

<match app.log>
  type file
  path /var/log/td-agent/app/app.java.log
  time_slice_wait 10m
</match>

Write a regular expression that represents one json break in format_firstline.
In format1, time, thread, and message are parsed with regular expressions. Since it is * (? . *), Read to the end of the stack trace.
multiline_flush_interval 10s
Every time a log flows, fluentd runs and analyzes it, but initially it is buffered in memory.
Then, after the analysis of the next log, the buffered one will be flushed. Flush this parameter every 10 seconds even if it is not extruded. It becomes the parameter.
The analyzed log is formatted as follows.

`/var/log/td-agent/app/app.java.log`


2017-02-01T13:33:21+09:00       app.log {"thread":"main","level":"ERROR","message":" Unexpected Error\njava.lang.NumberFormatException: For input string: \"1.1\"\n        at java.lang.NumberFormatException.forInputString(Unknown Source)\n        at java.lang.Integer.parseInt(Unknown Source)\n        at java.lang.Integer.<init>(Unknown Source)\n        at ExceptionPrintDemo.formatInt(ExceptionPrintDemo.java:7)\n        at ExceptionPrintDemo.main(ExceptionPrintDemo.java:14)"}

mackerel settings

Introduction to monitoring services (Mackerel, Datadog, New Relic)! has instructions on how to get started, so please refer to that.
Set the log monitoring settings like this. Normal usage is OK.

[plugin.checks.app-log]
command = '''
  check-log \
    --file-pattern '/var/log/td-agent/app/app.java.log.*.log' \
    --pattern '\"level\":\"ERROR\"' \
    --return
'''

--return prints an error line and sends its contents to Mackerel
You should also monitor that there are two td-agent processes up.

[plugin.checks.process_td-agent]
command = "check-procs --pattern td-agent -C 2"

Other settings

It's easy to forget to rotate td-agent.log. Let's set https://github.com/treasure-data/td-agent/blob/master/td-agent.logrotate

result

The contents of the stack trace are also output properly.

in conclusion

If you use fluentd like this time, it is easier to introduce and recommended. Isn't it very important in operational design whether or not such a small thing can be devised?

Let's improve the operation a little by monitoring and notifying the multi-line log by keyword