[Python] Parsing randomly generated XML [ElementTree]

Library import

Import the library ElementTree required for XML parsing. glob gets the XML Path in the folder Pandas are stored in an array. Export from pandas to csv and save to MySQL.

from xml.etree import ElementTree
import glob
import pandas as pd

XML Analyze XML data like this. offset stores the time in seconds. 86400 seconds is 24 hours

-<Day>
 -<EventCollection>
  -<Event>
   <Offset>0</Offset>
   <Value>70</Value>
  </Event>
  -<Event>
   <Offset>86400</Offset>
   <Value>69</Value>
  </Event>
   -<Event>
   <Offset>172800</Offset>
   <Value>73</Value>
  </Event>
<Day>

Get XML from folder and parse

Gets and parses the path of the XML downloaded in the folder.

xmlfile = glob.glob("C:/Users/user/*") #Store XML file path in file

file = len(xmlfile) #Count the number of XML files

i_file = 0 #Number to specify the XML file in order from the top_0 is the very first

for i in range(file):
    XMLFILE = xmlfile[i_file]
    i_file += 1
    tree = ElementTree.parse(XMLFILE)  #Read XML file
    root = tree.getroot() #Get the contents of XML

    #prepare list
    Day = [] 
    Night = []

    #Store all numbers in list using for
    for e in root.findall('.//Day/EventCollection/Event/Value'):
        Day.append(e.text)
    for e in root.findall('.//Night/EventCollection/Event/Value'):
        Night.append(e.text)

    print(Day)
    [70,69,73]

Store List in Pandas and then save in csv


#Combine list into one
listData = [Day, Night]

#Convert listData to DataFrame..T swaps rows and columns
df = pd.DataFrame(listData).T

#Add column name
df.columns = ['Day', 'Night']

#Set csv path and save pandas DataFram to csv
filename = 'C:/Users/user/csv/AAA.csv'
df.to_csv(filename, index=False)

Save DataFrame in MySQL

Save the DataFrame output by csv to MySQL as well.

Import library

from sqlalchemy import create_engine
url = 'mysql+mysqlconnector://[user]:[pass]@[host]:[port]/sampleDB'
engine = create_engine(url, echo=True)

#df.to_with sql'sampleDB_table'Save the data stored in df
df.to_sql('sampleDB_table', engine, index=False, if_exists='append')

I run this from FileMaker, but if it's Windows, if you run it regularly with a task scheduler etc., the data will be automatically stored in the database and you can use it for later analysis. In the future, I would like to work on data visualization (BI tool) using PHP, javascript, etc., and then machine learning (AI).

Recommended Posts

[Python] Parsing randomly generated XML [ElementTree]
Speed comparison of Python XML parsing
Compare xml parsing speeds with Python and Go
Parse XML in Python
[Python] Rewrite past mistakes (XML)
Generate XML (RSS) with Python
Read Namespace-specified XML in Python
Automatically generated catch phrase [Python]
Process feedly xml with Python.