[PYTHON] I want to identify the alert email. --Is that x a wildcard? ---

Hello. I'm Tanaka from NETS1.

I want to identify the alert email. ――XXXXX is a wild card! - is a continuation.

It seems that no one is registered on the last day, so I decided to write it.

Appearance of emails that do not judge well

One day I received an alert email like this.

Dec 14 12:59:12 app001 2020/12/14 12:59:12.449 app001 ERROR ERROR002 ecnxeci-1349 Great things happened. sugoi[10223]And an error Oh, what!

If you look at the list, you can clearly see that it is a telephone call.

Error code Error message Correspondence
ERROR002 app00x ERROR ERROR002 xxxxx A great thing happened. XXXXX[10223]Error in Telephone contact
ERROR002 Great thing happened. XXXXX[YYYYY]Error in Email contact
ERROR002 Great thing happened. ignore

But if you look at the result with the previous code ...

result2.png

ecn has been judged as delete. If true, I would like ecnxeci-1349 to be judged as replace, I misunderstood that the x contained in the ecnxeci-1349 part matched the xxxxx part on the list side.

Do your best to respond

Correspondence policy

If x matches, it shouldn't be x!

It's easy to say. You just have to replace it with something other than x that matches xxxxx. But how do you get x to be an unexpected match for a wildcard ...

Wildcard conditions

If you don't define what a wildcard is, you can't judge it as a wildcard.

I think this is the wildcard when I wrote it so that people can easily see it. It is evaluated as replace when evaluated by difflib, and on the manual (list) side If the string meets the above conditions, it is a wildcard.

Determining if a wildcard is an unexpected match

As for how to judge, I decided to utilize the evaluation value of difflib.

  1. For the part judged as a wild card, replace the manual side with the character string on the mail side and output the evaluation value.
  2. Does the string evaluated as equal meet the wildcard condition?
  3. If applicable, replace the mail side with an appropriate character string and re-evaluate
  4. When re-evaluated, if it is higher than the evaluation value of 1., it is judged as an unexpected match.

In this example,

 : Dec 14 12:59:12 app001 2020/12/14 12:59:12.449  |
equal : app00 | app00
replace : 1 | x
equal :  ERROR ERROR002  |  ERROR ERROR002
delete : ecn |
equal : x | x
replace : eci-1349 | xxxx
equal :Great thing happened.|Great thing happened.
replace : sugoi | XXXXX
equal : [10223]Error in| [10223]Error in
 :Oh, what!|

From this result, 1. Now, create a new evaluation character string as shown below and output the evaluation value.

app001 ERROR ERROR002 xeci-1349 Great things happened. sugoi[10223]Error in

At this time, of course, the xeci-1349 part does not match the character string on the mail side, so the evaluation value does not reach the maximum of 1.0. Next, in 2. and 3., set x in the part of equal: x | x to an appropriate character string, and create the following mail side character string. (Unnecessary beginning and end of the email are deleted at the pre-processing stage)

app001 ERROR ERROR002 ecnieci-1349 Great things happened. sugoi[10223]Error in

Then use this email string to re-evaluate from 1. Then, in 1. of re-evaluation, the following evaluation character string is newly created and the evaluation value is output.

app001 ERROR ERROR002 ecnieci-1349 Great things happened. sugoi[10223]Error in

At this time, the re-evaluation value becomes 1.0, which is the maximum, and the evaluation value rises, so it can be seen that the match was unexpected.

Try to implement

The main part is the same as before, so it is omitted

def search_space(message):
    '''Returns the character position next to the space (for convenience, start with a space)'''
    space_pos = [0]
    index = 0
    for c in message:
        if c == ' ':
            space_pos.append(index + 1)
        index+=1
    return space_pos

def is_replacement(string):
    # x,Replace if X Other than that, if it is a single character, it is not applicable
    if len(string) <= 1:
        if string.lower() == 'x':
            return True
        return False

    #Not applicable if the characters are not the same consecutively
    pre_char = string[1]
    for char in string:
        if pre_char != char:
            return False
    return True

def diff_analyzer(skip_seek, msg, man_msg):
    fix = 0
    fix_man_msg = man_msg
    fix_msg = msg
    opcodes = [('', 0, skip_seek, 0, 0)]

    seq = difflib.SequenceMatcher(None, msg, man_msg)
    ratio = seq.ratio()

    for tag, i1, i2, j1, j2 in seq.get_opcodes():
        fj1 = j1 + fix
        fj2 = j2 + fix

        if tag == 'replace':
            #Wildcard replacement target changes the manual side and new tags(fix_equal)Put on
            if is_replacement(fix_man_msg[fj1:fj2]):
                fix_man_msg = fix_man_msg[:fj1] + msg[i1:i2] + fix_man_msg[fj2:]
                fix = fix + i2 - i1 - (fj2 - fj1)
                tag = 'fix_equal'
        elif tag == 'equal':
            #It is assumed that the random character string on the message side happens to match around x on the manual side.
            #Forced change if one character is equal except for spaces(If it is not good to change it, the evaluation should be lowered at the time of re-evaluation)
            #If there is a match, replace the message side with an appropriate character
            if (fj2 - fj1 == 1 and fix_man_msg[fj1:fj2] != ' ') or is_replacement(fix_man_msg[fj1:fj2]):
                replace_msg = ''
                for letter in msg[i1:i2]:
                    #Add 100 to unicode and replace with different characters(Super violent)
                    replace_msg += chr(ord(letter) + 100)
                fix_msg = fix_msg[:i1] + replace_msg + fix_msg[i2:]

        opcodes.append((tag, skip_seek + i1, skip_seek + i2, j1, j2))
        finish_seek = skip_seek + i2

    #Re-evaluate only ratio when replacing wildcards
    if fix_man_msg != man_msg:
        ratio = difflib.SequenceMatcher(None, msg, fix_man_msg).ratio()

    #Re-evaluate when there is an unexpected match
    #If the result of the re-evaluation is not an unexpected match, fix_Discard msg
    if fix_msg != msg:
        f_seek, f_opcodes, f_ratio = diff_analyzer(skip_seek, fix_msg, man_msg)
        print(f_ratio, ':', fix_msg)
        print(ratio, ':', msg)
        if ratio < f_ratio:
            finish_seek = f_seek
            opcodes = f_opcodes
            ratio = f_ratio
        else:
            fix_msg = msg

    return (finish_seek, opcodes, ratio)

def check_message_by_difflib(manual, message):

    space_pos = search_space(message)

    ratio = 0
    #Evaluate each space as the beginning
    for i in space_pos:
        msg = message[i:]
        delete_flag = False

        #If the end ends with delete, delete and evaluate
        tag, i1, i2, j1, j2 = difflib.SequenceMatcher(None, msg, manual).get_opcodes()[-1]
        if tag == 'delete':
            msg = msg[:i1]
            delete_flag = True

        finish_seek, tmp_opcodes, tmp_ratio = diff_analyzer(i, msg, manual)

        if ratio <= tmp_ratio:
            if delete_flag:
                tmp_opcodes.append(('', finish_seek, len(message), 0, 0))
            ratio = tmp_ratio
            opcodes = tmp_opcodes

    return opcodes, ratio

test

...abridgement...

 : Dec 14 12:59:12 app001 2020/12/14 12:59:12.449  |
equal : app00 | app00
fix_equal : 1 | x
equal :  ERROR ERROR002  |  ERROR ERROR002
fix_equal : ecnxeci-1349 | xxxxx
equal :Great thing happened.|Great thing happened.
fix_equal : sugoi | XXXXX
equal : [10223]Error in| [10223]Error in
 :Oh, what!|

ecnxeci-1349 is evaluated as a wildcard part (fix_equal), and it looks good. But I'm wondering if I can really judge only the wildcard part, so I will compare even such a character string.

Mail side… app001 ERROR fix_data.sh error Manual side… app00x ERROR boxdata.sh error

Execution result

0.8813559322033898 : app001 ERROR fiÜ_data.sh error
0.9152542372881356 : app001 ERROR fix_data.sh error
0.7307692307692307 : ERROR fiÜ_data.sh error
0.7692307692307693 : ERROR fix_data.sh error
0.5652173913043478 : fiÜ_data.sh error
0.6086956521739131 : fix_data.sh error
app001 ERROR fix_data.sh error

 :  |
equal : app00 | app00
fix_equal : 1 | x
equal :  ERROR  |  ERROR
replace : fi | bo
equal : x | x
delete : _ |
equal : data.sh error | data.sh error

It is still correctly equal: x | x. Even if you look at the first and second lines, the evaluation value after replacing as intended has dropped. It's a nice atmosphere.

I gave an evaluation value and let me judge

I think that main should be changed so that the evaluation value of 1.0 and the longest match is the judgment result. Like this.

mail = 'Dec 14 12:59:12 app001 2020/12/14 12:59:12.449 app001 ERROR ERROR002 ecnxeci-1349 Great things happened. sugoi[10223]And an error Oh, what!'
manual1 = 'app00x ERROR ERROR002 xxxxx A great thing happened. XXXXX[10223]Error in'
manual2 = 'Great thing happened. XXXXX[YYYYY]Error in'
manual3 = 'Great thing happened.'

max_match_length = 0
result = 'Not applicable'
for manual in manuals:
    opcodes, ratio = check_message_by_difflib(manual, mail)
    match_length = sum([opcode[2] - opcode[1] for opcode in opcodes if opcode[0] == 'fix_equal' or opcode[0] == 'equal'])

    if ratio == 1:
        if max_match_length < match_length:
            max_match_length = match_length
            result = manual

print('result:' + result)

Execution result

Result: app00x ERROR ERROR002 xxxxx Great thing happened. XXXXX[10223]Error in

at the end

I tried my best to make a reasonable mechanical judgment, but in the end it was a visual check. There may be a bug in the tool, and it doesn't correspond to "xxx must have the same number of characters". ~~ (Actually, there is a typographical error on the manual side) ~~ Don't believe in the machine too much and make a mistake in determining the really important alerts ...

In the future, I'd like to be able to correct judgment errors based on opcodes results and actual judgment results.

Recommended Posts

I want to identify the alert email. --Is that x a wildcard? ---
I want to use a wildcard that I want to shell with Python remove
The story of IPv6 address that I want to keep at a minimum
I want to send a business start email automatically
I want to say that there is data preprocessing ~
The story of Linux that I want to teach myself half a year ago
[Memorandum] ① Get and save tweets ~ I want to identify the news tweets that are spread ~
I want to initialize if the value is empty (python)
I want to create a Dockerfile for the time being.
I want to find a stock that will rise 5 minutes after the Nikkei Stock Average rises
I want to record the execution time and keep a log.
Qiskit: I want to create a circuit that creates arbitrary states! !!
I want to create a system to prevent forgetting to tighten the key 1
I want to specify a file that is not a character string for logrotate, but is it impossible?
I want to pin Spyder to the taskbar
I want to output to the console coolly
I want to handle the rhyme part1
I want to handle the rhyme part3
I want to build a Python environment
I want to display the progress bar
I want to handle the rhyme part2
I want to handle the rhyme part5
I want to handle the rhyme part4
I want to send a signal only from the sub thread to the main thread
I want to sort a list in the order of other lists
I made a library konoha that switches the tokenizer to a nice feeling
What is a C language library? What is the information that is open to the public?
Python: I want to measure the processing time of a function neatly
[For beginners] I want to get the index of an element that satisfies a certain conditional expression
I want to receive the configuration file and check if the JSON file generated by jinja2 is a valid JSON
I want to make matplotlib a dark theme
I want to get the path of the directory where the running file is stored.
I want to easily create a Noise Model
[LINE Messaging API] I want to send a message from the program to everyone's LINE
I want to create a window in Python
I want to create a priority queue that can be updated in Python (2.7)
I want to email from Gmail using Python.
I want to make a game with Python
I want to handle the rhyme part7 (BOW)
I want to make a music player and file music at the same time
I don't want to take a coding test
I want to set a life cycle in the task definition of ECS
I want to add silence to the beginning of a wav file for 1 second
I want to exe and distribute a program that resizes images Python3 + pyinstaller
I want to see a list of WebDAV files in the Requests module
I want to create a web application that uses League of Legends data ①
I wrote a script to revive the gulp watch that will die soon
I want to create a plug-in type implementation
I want to easily find a delicious restaurant
Python program is slow! I want to speed up! In such a case ...
I want to customize the appearance of zabbix
I want to write to a file with Python
I want to use the activation function Mish
I want to display the progress in Python!
I want to be notified when the command operation is completed on linux!
I want to upload a Django app to heroku
[LPIC 101] I tried to summarize the command options that are easy to make a mistake
[Python] I made a system to introduce "recipes I really want" from the recipe site!
I want to get started with the Linux kernel, what is the list head structure?
[Mac] I want to make a simple HTTP server that runs CGI with Python
[Pyhton] I want to solve the problem that tkinter does not work on MacOS11