Reading, summarizing, visualizing, and exporting time series data to an Excel file with Python

I had the opportunity to aggregate time series data at work, so I did it in Python, so I will summarize it.

Aggregate time-series data while performing operations that are likely to be used in data analysis, such as the title.

Purpose of this article

・ Aggregate time series data

Actually, I exported it as an Excel file of the record from FileMaker, modified it a little, and loaded it into Python.

Data created for each date, such as the number of inspections and operations, can be aggregated monthly and yearly in an instant.

I think that various aggregations can be easily performed by changing the data to be read.

environment

-Python 3.8.5

test data

This time, we have prepared a csv file for a certain inspection from 2015/1/1 to 2020/12/1.

I felt like I had one inspection every month.

Library import

First, import your favorite libraries. (A mixture of libraries not used this time)

Calculations and visualizations are almost always used when analyzing data, so it's a good idea to save them in something and copy and paste them.


#Computational system
import pandas as pd
import numpy as np

#Visualization system
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
%matplotlib inline

Data reading

I think either csv or Excel is fine.

This time, it is read in the form of DataFrame (spreadsheet like Excel as an image) that is familiar to the senses.

In the case of Excel, if you specify sheet_name as an argument, you can specify it by sheet number (starting from 0) or sheet name.

#loading csv
df = pd.read_csv('/Path name/file name.csv')

#Read Excel file
df = pd.read_excel('/Path name/file name.xlsx' , sheet_name=Sheet number or'Sheet name') 

Confirmation of reading and confirmation of data type

#Show first line
df.head()

#Check the data type
df.dtypes

Display result スクリーンショット 2020-12-14 8.56.24.png

The date is entered in the first column and the type (inspection this time) is entered in the second column.

Replace the specified string with the integer 1

In order to total, it is possible to count the specified character string, but since the method of totaling this time seems to count the numerical value, it is necessary to replace the inspection column with the integer "1".

df.replace('Inspection', 1, inplace=True)

Convert date column to datetime type and specify in index

#Library import
import datetime
#Set the first column to index with datetime type
#Make the day column a datetime type
df['date']=pd.to_datetime(df['date'])

#Assign date column to index
df.set_index('date', inplace=True)

Aggregate

Finally, I am creating a new DateFrame to export as an Excel file.

#Aggregated by year
df2 = df.resample(rule='Y').sum().head()

Visualization

#Graph creation
fig = plt.figure(figsize=(10,5))
plt.title('Title')
plt.ylabel('Y Label')
plt.xlabel('Year')
plt.plot(df2['type'])
plt.show()

This time, since it is a body data set that one test was performed every month, 12 cases every year will be a graph like this. (Although the X axis is strange)

result nenbetu_kensasu.png

Export Excel file

If you want to create an elaborate graph, or if you want to share and save the aggregated results, it is better to save it in an Excel file that most people can see and use.

import openpyxl

#to_Export df with excel
df2.to_excel('Arbitrary file name.xlsx')

bonus

Save the visualized image. This is convenient when creating materials using images.

fig.savefig("Arbitrary file name.png ")

References / Sites

-Replace/note.nkmk.me that replaces the value of the element of pandas.DataFrame, Series "https://note.nkmk.me/python-pandas-replace/"

・ Calculate the total and average of time series data by day of the week, month, quarter, and year with pandas /note.nkmk.me "https://note.nkmk.me/python-pandas-time-series-multiindex/"

Recommended Posts

Reading, summarizing, visualizing, and exporting time series data to an Excel file with Python
Reading Note: An Introduction to Data Analysis with Python
Create an Excel file with Python3
How to read an Excel file (.xlsx) with Pandas [Python]
Quickly create an excel file with Python #python
Convert Excel data to JSON with python
Reading OpenFOAM time series data and sets data
[Python] How to read excel file with pandas
Format and display time series data with different scales and units with Python or Matplotlib
How to measure mp3 file playback time with python
Type after reading an excel file with pandas read_excel
[Python] Plot time series data
I made a package to filter time series with python
Try to operate an Excel file using Python (Pandas / XlsxWriter) ①
Try to operate an Excel file using Python (Pandas / XlsxWriter) ②
How to generate exponential pulse time series data in python
Graph time series data in Python using pandas and matplotlib
[Python Kivy] How to create an exe file with pyinstaller
[Python] How to output a pandas table to an excel file
Get additional data to LDAP with python (Writer and Reader)
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
An introduction to statistical modeling for data analysis (Midorimoto) reading notes (in Python and Stan)
How to extract features of time series data with PySpark Basics
Python: Time Series Analysis: Preprocessing Time Series Data
[Python] Write to csv file with Python
Output to csv file with Python
Python CSV file reading and writing
Reading and writing NetCDF with Python
About time series data and overfitting
Reading and writing CSV with Python
Move data to LDAP with python Change / Delete (Writer and Reader)
I tried to open the latest data of the Excel file managed by date in the folder with Python
Challenge to create time axis list report with Toggl API and Python
How to get the date and time difference in seconds with python
Try to make foldl and foldr with Python: lambda. Also time measurement
Plot CSV of time series data with unixtime value in Python (matplotlib)
Open an Excel file in Python and color the map of Japan
Read the file with python and delete the line breaks [Notes on reading the file]
Read CSV file with Python and convert it to DataFrame as it is
Error due to UnicodeDecodeError when reading CSV file with Python [For beginners]
Data pipeline construction with Python and Luigi
Predict time series data with neural network
How to convert Python to an exe file
Fractal to make and play with Python
Convert FX 1-minute data to 5-minute data with Python
Reading and writing JSON files with Python
[Easy Python] Reading Excel files with openpyxl
Create Excel file with Python + similarity matrix
[Easy Python] Reading Excel files with pandas
Compress python data and write to sqlite
Importing and exporting GeoTiff images with Python
How to handle time series data (implementation)
Convert csv, tsv data to matrix with python --using MovieLens as an example
I want to write an element to a file with numpy and check it.
[Python] Concatenate a List containing numbers and write it to an output file.
For those who want to learn Excel VBA and get started with Python
Introduction to Time Series Analysis ~ Seasonal Adjustment Model ~ Implemented in R and Python
Create an animated time series map of coronavirus infection status with python + plotly
Get time series data from k-db.com in Python
How to read a CSV file with Python 2/3
Scraping tabelog with python and outputting to CSV