This article is for super beginners.

I had the opportunity to aggregate time series data at work, so I did it in Python, so I will summarize it.

Aggregate time-series data while performing operations that are likely to be used in data analysis, such as the title.

Purpose of this article

・ Aggregate time series data

Actually, I exported it as an Excel file of the record from FileMaker, modified it a little, and loaded it into Python.

Data created for each date, such as the number of inspections and operations, can be aggregated monthly and yearly in an instant.

I think that various aggregations can be easily performed by changing the data to be read.

environment

-Python 3.8.5

test data

This time, we have prepared a csv file for a certain inspection from 2015/1/1 to 2020/12/1.

I felt like I had one inspection every month.

Library import

First, import your favorite libraries. (A mixture of libraries not used this time)

Calculations and visualizations are almost always used when analyzing data, so it's a good idea to save them in something and copy and paste them.


#Computational system
import pandas as pd
import numpy as np

#Visualization system
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline

Data reading

I think either csv or Excel is fine.

This time, it is read in the form of DataFrame (spreadsheet like Excel as an image) that is familiar to the senses.

In the case of Excel, if you specify sheet_name as an argument, you can specify it by sheet number (starting from 0) or sheet name.

#loading csv
df = pd.read_csv('/Path name/file name.csv')

#Read Excel file
df = pd.read_excel('/Path name/file name.xlsx' , sheet_name=Sheet number or'Sheet name')

Confirmation of reading and confirmation of data type

When using with jupyter etc., execute it line by line.

#Show first line
df.head()

#Check the data type
df.dtypes

Display result スクリーンショット 2020-12-14 8.56.24.png

The date is entered in the first column and the type (inspection this time) is entered in the second column.

Replace the specified string with the integer 1

In order to total, it is possible to count the specified character string, but since the method of totaling this time seems to count the numerical value, it is necessary to replace the inspection column with the integer "1".

df.replace('Inspection', 1, inplace=True)

Convert date column to datetime type and specify in index

#Library import
import datetime
#Set the first column to index with datetime type
#Make the day column a datetime type
df['date']=pd.to_datetime(df['date'])

#Assign date column to index
df.set_index('date', inplace=True)

Aggregate

Finally, I am creating a new DateFrame to export as an Excel file.

#Aggregated by year
df2 = df.resample(rule='Y').sum().head()

Visualization

#Graph creation
fig = plt.figure(figsize=(10,5))
plt.title('Title')
plt.ylabel('Y Label')
plt.xlabel('Year')
plt.plot(df2['type'])
plt.show()

This time, since it is a body data set that one test was performed every month, 12 cases every year will be a graph like this. (Although the X axis is strange)

result

Export Excel file

If you want to create an elaborate graph, or if you want to share and save the aggregated results, it is better to save it in an Excel file that most people can see and use.

import openpyxl

#to_Export df with excel
df2.to_excel('Arbitrary file name.xlsx')

bonus

Save the visualized image. This is convenient when creating materials using images.

fig.savefig("Arbitrary file name.png ")

References / Sites

-Replace/note.nkmk.me that replaces the value of the element of pandas.DataFrame, Series "https://note.nkmk.me/python-pandas-replace/"

・ Calculate the total and average of time series data by day of the week, month, quarter, and year with pandas /note.nkmk.me "https://note.nkmk.me/python-pandas-time-series-multiindex/"

Reading, summarizing, visualizing, and exporting time series data to an Excel file with Python