I had the opportunity to aggregate time series data at work, so I did it in Python, so I will summarize it.
Aggregate time-series data while performing operations that are likely to be used in data analysis, such as the title.
・ Aggregate time series data
Actually, I exported it as an Excel file of the record from FileMaker, modified it a little, and loaded it into Python.
Data created for each date, such as the number of inspections and operations, can be aggregated monthly and yearly in an instant.
I think that various aggregations can be easily performed by changing the data to be read.
-Python 3.8.5
This time, we have prepared a csv file for a certain inspection from 2015/1/1 to 2020/12/1.
I felt like I had one inspection every month.
First, import your favorite libraries. (A mixture of libraries not used this time)
Calculations and visualizations are almost always used when analyzing data, so it's a good idea to save them in something and copy and paste them.
#Computational system
import pandas as pd
import numpy as np
#Visualization system
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline
I think either csv or Excel is fine.
This time, it is read in the form of DataFrame (spreadsheet like Excel as an image) that is familiar to the senses.
In the case of Excel, if you specify sheet_name as an argument, you can specify it by sheet number (starting from 0) or sheet name.
#loading csv
df = pd.read_csv('/Path name/file name.csv')
#Read Excel file
df = pd.read_excel('/Path name/file name.xlsx' , sheet_name=Sheet number or'Sheet name')
#Show first line
df.head()
#Check the data type
df.dtypes
Display result
The date is entered in the first column and the type (inspection this time) is entered in the second column.
In order to total, it is possible to count the specified character string, but since the method of totaling this time seems to count the numerical value, it is necessary to replace the inspection column with the integer "1".
df.replace('Inspection', 1, inplace=True)
#Library import
import datetime
#Set the first column to index with datetime type
#Make the day column a datetime type
df['date']=pd.to_datetime(df['date'])
#Assign date column to index
df.set_index('date', inplace=True)
Finally, I am creating a new DateFrame to export as an Excel file.
#Aggregated by year
df2 = df.resample(rule='Y').sum().head()
#Graph creation
fig = plt.figure(figsize=(10,5))
plt.title('Title')
plt.ylabel('Y Label')
plt.xlabel('Year')
plt.plot(df2['type'])
plt.show()
This time, since it is a body data set that one test was performed every month, 12 cases every year will be a graph like this. (Although the X axis is strange)
result
If you want to create an elaborate graph, or if you want to share and save the aggregated results, it is better to save it in an Excel file that most people can see and use.
import openpyxl
#to_Export df with excel
df2.to_excel('Arbitrary file name.xlsx')
Save the visualized image. This is convenient when creating materials using images.
fig.savefig("Arbitrary file name.png ")
-Replace/note.nkmk.me that replaces the value of the element of pandas.DataFrame, Series "https://note.nkmk.me/python-pandas-replace/"
・ Calculate the total and average of time series data by day of the week, month, quarter, and year with pandas /note.nkmk.me "https://note.nkmk.me/python-pandas-time-series-multiindex/"
Recommended Posts