Let's improve the operation a little by monitoring and notifying the multi-line log by keyword

Introduction

A small story.

Keyword monitoring of logs that are completed in one line, such as apache logs, can be achieved relatively easily. For example, the apache error log is complete with one line and is monitored by keywords such as [error] and [alert] [crit].

[Wed Oct 11 14:32:52 2000] [error] [client 127.0.0.1] client denied by server configuration: /export/home/live/ap/htdocs/test

When you get a stack trace in Java, it will be multiple lines. This should be monitored with the keyword ERROR.

2017-2-01 13:33:21 [main] ERROR Unexpected Error
java.lang.NumberFormatException: For input string: "1.1"
        at java.lang.NumberFormatException.forInputString(Unknown Source)
        at java.lang.Integer.parseInt(Unknown Source)
        at java.lang.Integer.<init>(Unknown Source)
        at ExceptionPrintDemo.formatInt(ExceptionPrintDemo.java:7)
        at ExceptionPrintDemo.main(ExceptionPrintDemo.java:14)

However, when monitoring the log with the monitoring tool, only one line can be recognized, so the following stack trace cannot be notified by email. ** As a result of being notified by e-mail of the message "2017-2-01 13:33:21 [main] ERROR Unexpected Error", "Oh ..." and see the log every time on the production machine. Did you go? ** **

In addition to **, you may want to monitor not only the keywords on the first line but also the keywords on the second and subsequent lines. ** ** Also, ** on older systems, I sometimes see dates on the first line, error codes on the second line, and error messages on the third line **, which makes keyword monitoring cry.

20170201 11:12:34
ERR001
○○ The code is invalid.
20170201 11:12:34
ERR002
Unable to connect to database

This time I tried to solve that area using fluentd. The monitoring tool uses mackerel, but I think it is the same for other monitoring tools.

solution

Use fluentd to convert to a single line of json and monitor.

Setup steps

Install fluentd

curl -L https://toolbelt.treasuredata.com/sh/install-redhat-td-agent2.sh | sh

fluentd settings

/etc/td-agent/td-agent.conf


<source>
  type tail
  path /var/log/app/app.log
  pos_file /var/log/td-agent/app/app-log.pos
  tag app.log
  multiline_flush_interval 10s
  format multiline
  format_firstline /\d{4}-\d{1,2}-\d{1,2}/
  format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\](?<level>[^\s]+)(?<message>.*)/
</source>

<match app.log>
  type file
  path /var/log/td-agent/app/app.java.log
  time_slice_wait 10m
</match>

/var/log/td-agent/app/app.java.log


2017-02-01T13:33:21+09:00       app.log {"thread":"main","level":"ERROR","message":" Unexpected Error\njava.lang.NumberFormatException: For input string: \"1.1\"\n        at java.lang.NumberFormatException.forInputString(Unknown Source)\n        at java.lang.Integer.parseInt(Unknown Source)\n        at java.lang.Integer.<init>(Unknown Source)\n        at ExceptionPrintDemo.formatInt(ExceptionPrintDemo.java:7)\n        at ExceptionPrintDemo.main(ExceptionPrintDemo.java:14)"}

mackerel settings

[plugin.checks.app-log]
command = '''
  check-log \
    --file-pattern '/var/log/td-agent/app/app.java.log.*.log' \
    --pattern '\"level\":\"ERROR\"' \
    --return
'''
[plugin.checks.process_td-agent]
command = "check-procs --pattern td-agent -C 2"

Other settings

It's easy to forget to rotate td-agent.log. Let's set https://github.com/treasure-data/td-agent/blob/master/td-agent.logrotate

result

The contents of the stack trace are also output properly.

image

in conclusion

If you use fluentd like this time, it is easier to introduce and recommended. Isn't it very important in operational design whether or not such a small thing can be devised?

reference

Recommended Posts

Let's improve the operation a little by monitoring and notifying the multi-line log by keyword
[Enum] Let's improve the readability of data by using rails enum