[PYTHON] (For lawyers) Extract the behavior of Office software from .evtx files

About a year ago, I wrote an article Analyzing .evtx files with Python, but roughly speaking, it's an application. Is it?

Whether or not lawyers use it to prove working hours in court ... I wonder if there are times when you want to work and extract data from an EVTX (Windows XML Event Log) file that records Windows system logs. think. (It seems that the EVTX file is in Win7 format, and it may be different depending on the OS version, but I think that the way of thinking does not change so much.) If it is a PC that you do not take out from the office, it seems that it is okay from startup to termination, but if you are allowed to take out the PC, it is not always clear whether the holiday PC startup is work or private. However, if you use Office software, it may be like work. So, continuing from the previous article, let's use Python-Evtx to extract such event logs using Office software.

The idea is almost the same as the previous article, so please refer to that. So, when I wrote a script to list what kind of program appears in Data [@ Name = ProcessName] or Data [@ Name = NewProcessName], it seems that the character string ʻOfficeis included. I felt like the Office software would work. So, if you make a code like this roughly and make it look like$ python ExtractOffice.py EventLog.evtx, it will spit out a file called ʻEvents_office.tsv.

ExtractOffice.py


import Evtx.Evtx as evtx
from lxml import etree

schema = "http://schemas.microsoft.com/win/2004/08/events/event"

def main():
    f = open("events_office.tsv", "w") #File name direct hit(Sweat
    import argparse
    
    parser = argparse.ArgumentParser(
        description="Dump a binary EVTX file into XML.")
    parser.add_argument("evtx", type=str,
                        help="Path to the Windows EVTX event log file")
    args = parser.parse_args()
    
    #Working with EVTX files
    with evtx.Evtx(args.evtx) as log:
        counter = 0 #For progress report
        for record in log.records():
            elm = record.lxml()
            #progress report
            counter += 1
            if counter % 1000 == 0:
                print("Now on record:"+str(counter))
            
            pn = elm.xpath("//event:Data[@Name='ProcessName']", namespaces={"event":schema})
            npn = elm.xpath("//event:Data[@Name='NewProcessName']", namespaces={"event":schema})
            pnt="" #The default value of ProcessName""Nishitoku
            npnt="" #NewProcessName (abbreviation)
            try: #Try because some failure cast may occur
                if ("Office" in pn[0].text): #String search here
                    pnt = pn[0].text
            except:
                pnt = ""
            try:
                if "Office" in npn[0].text:
                    npnt = npn[0].text
            except:
                npnt = ""
            
            if ( len(pnt) or len(npnt) ):
                print(
                    elm.xpath("//event:EventID", namespaces={"event":schema})[0].text
                    +"\t"+
                    elm.xpath("//event:TimeCreated", namespaces={"event":schema})[0].get("SystemTime")
                    +"\t"+pnt
                    +"\t"+npnt
                , file=f)
        print(counter) #When finished, write down the number of events.
    f.close()


if __name__ == "__main__":
    main()

Then, read the spit out ʻEvents_office.tsv` in Excel and format it appropriately (→ For example, Like this (separate article)), If you put it together properly, it will look like that.

Some supplementary explanation

The EVTX file is XML for each event (Record), and it seems that the Data elements under// Event / EventData /are not constant depending on the type of ʻEvent. However, if there is either ProcessName or NewProcessName`, it seems that somehow you can tell which program the log was spit out from. So, I tried to supplement one of them. Since the same processing is done for ProcessName and NewProcessName, I understand that it is better to refactor to go out as a function ...

Recommended Posts

(For lawyers) Extract the behavior of Office software from .evtx files
Extract only complete from the result of Trinity
Extract the table of image files with OneDrive & Python
Extract files from EC2 storage with the scp command
Search for large files on Linux from the command line
Check the increase / decrease of Bitcoin for each address from the blockchain
Studying web scraping for the purpose of extracting data from Filmarks # 2
Existence from the viewpoint of Python
About the behavior of yield_per of SqlAlchemy
Extract strings from files in Python
I tried to automatically extract the movements of PES players with software
Get the list of packages for the specified user from the packages registered on PyPI
[Python] Master the reading of csv files. List of main options for pandas.read_csv.