Even if you want to save money, you don't know if you are increasing or decreasing cash in the first place.
It may be possible to visualize it using Zaim or Moneyforward, but analysis requires becoming a premium member, and it is troublesome to enter each time, so it is discouraging to manage.
Since I want to visualize the cash deposit / withdrawal status, I implemented a script that visualizes the increase / decrease of money in the deposit account using Python's Seaborn
.
Most online banking can export the deposit / withdrawal history for about one year with csv. Use this csv as an input to visualize the cash deposit / withdrawal history. There are many ways to look at visualization of money flow. Here, we will roughly visualize two things, "** stock " and " flow **". If you compare a deposit account to a bathtub and cash to water, stock and flow can be explained as follows.
** (1) Stock ** The amount of water in the bathtub at some point. That is, it represents the deposit balance of a certain month. If you compare the stocks of different months, you can see the ** increasing / decreasing trend of the deposit balance **.
** (2) Flow ** The amount of water that has flowed in a certain period. Furthermore, it can be divided into (2a) inflow and (2b) outflow, and represents the income and expenditure of a certain month. If you compare the inflows and outflows in the same month, you can see the ** profit and loss amount ** in a certain month.
Each section is supposed to be written in a notebook such as Jupyter Notebook
. Once you write it in your notebook, even if the deposit is updated later, you can replace the csv file and execute it sequentially, so it's easy. Version should work as long as you use Python3
.
In addition, the data used is the one in which a value like that is randomly generated. ~~ It's more than my own take ~~.
import numpy as np
import pandas as pd
import codecs
import seaborn as sns
sns.set(font='Hiragino Sans')
import matplotlib.pyplot as plt
If you are using ʻanaconda, you should be able to import everything without any problems. In the case of mac, seaborn does not recognize Japanese as it is, so specify a font that supports Japanese with
sns.set ()`.
code
#Read deposit / withdrawal history file
with codecs.open("../input/20200517212253.csv", "r", "Shift-JIS", "ignore") as file:
df = pd.read_table(file, delimiter=",")
#Preprocessing
"""
Implement the preprocessing of the read df.
df uses "transaction date" as an index, and has the following four columns.
1.Transaction name: Has two values, "payment" or "payment".
2.Amount: Transaction amount (flow)
3.Post-transaction balance: Deposit balance (stock) at that time
"""
When reading csv, specify the character code to avoid garbled characters. The preprocessing code is omitted because it depends on the item output by the net banking used. As described in the code, the index should have the transaction date, and also have the "transaction name", "amount (= flow)", and "post-transaction balance (= stock)".
** df sample **
By the way, in the case of net banking, the output result column may be divided into "year", "month", and "day", but you can combine them into one column with datetime
type as follows.
df["Transaction date"] = [pd.Timestamp(year=int(row["Handling date year"]),
month=int(row["Handling date month"]),
day=int(row["Handling date"])) for i, row in df.iterrows()]
Compare the deposit balances (stocks) of different months and check whether the balances are on an increasing trend or a decreasing trend.
Since the csv of the original data is inserted based on the deposit / withdrawal history, that is, the transaction occurrence base, it is not possible to do something like "Graph the monthly transition starting from the 1st of the month". Therefore, draw a line graph that plots all the columns of df "Balance after transaction".
code
#Generate a data frame for stock drawing
df4stock = df.copy()
#Visualization
fig, ax = plt.subplots(figsize=(20, 10))
sns.lineplot(x=df4stock.index, y="Post-transaction balance", data=df4stock, estimator=None);
fig.gca().ticklabel_format(style='plain', axis='y')
** Output result **
The blue circle marker indicates the occurrence of the transaction and the balance at that time. Salary income and card payment expenditure make up a monthly periodicity. The overall increase / decrease trend is estimated from the monthly maximum balance (payday).
In time series analysis, the overall trend of increase / decrease is defined as ** trend component **, and the increase / decrease with cycles such as monthly cycle is defined as ** seasonal component **. If you want to extract the trend component, you can insert the dummy data of the day when the transaction did not occur and use tsa.seasonal_decompose
of stats model
to separate the trend component and the seasonal component. Implementation is omitted in this article.
(Reference: Sakutto Trend Extraction: Introduction to Time Series Analysis with Python statsmodels)
Compare the amount of income and expenditure (flow) in a certain month and check how much the balance has increased or decreased.
The original data is grouped by year and month, and a new data frame with the amount of expenditure and income for each month is generated. Draw a bar graph that plots the balance and difference of each month based on the created data frame.
code
#Generate data frame for flow drawing
df4flow = df.groupby([df.index.year, df.index.month, "Transaction name"]).sum().reset_index("Transaction name")
df4flow
#Visualization
fig, ax = plt.subplots(figsize=(20, 10))
ax = sns.barplot(x=df4flow.index, y="Amount of money", hue="Transaction name", data=df4flow);
#Store the amount of each bar displayed by the value label in the list
height = [0 if np.isnan(i.get_height()) else i.get_height() for i in ax.patches]
for i, patch in enumerate(ax.patches):
diff = height[i//2] - height[i//2 + len(height)//2]
maxh = max(height[i//2], height[i//2 + len(height)//2])
#Displaying value labels
if i % 2 == 0:
colidx = i//2
ax.text(i//2 - patch.get_width()/2.0, height[colidx]/2.0, int(height[colidx]), color='black', ha="center")
ax.text(i//2, maxh+10000, int(diff), color='black', ha="center")
else:
colidx = i//2 + len(height)//2
ax.text(i//2 + patch.get_width()/2.0, height[colidx]/2.0, int(height[colidx]), color='black', ha="center")
** Output result **
The newly generated df4flow
data frame for the graph has a column with the total transaction amount by income / expenditure with the year and month in the MultiIndex.
In drawing, in addition to the bar graph, the value labels of each income / expenditure and the difference are drawn by the helper method ʻax.text () . The output value is obtained from the Rectangle object ʻax.patches
, which has information as a rectangle for each bar graph. Since the height of the rectangle is the amount as it is, get the height with the method get_height ()
of the same object. The position of the value label is calculated and placed so that it is in the middle of each bar.
code
for i in ax.patches:
print(i.get_height())
** Output result **
It can be seen that ʻax.pathces [0: 4]has income (blue) and ʻax.pathces [5: 9]
has expenditure (vermilion) bar information. Note that the output order is different from the data frame.
By visualizing the cash stock flow, it became possible to see whether the money is increasing or decreasing (I gave up buying a PC because it was decreasing). As mentioned above, if you put the above code together in Notebook, all you have to do is update the input data, which makes management easier.
It is surprisingly difficult to do graph output with Seaborn
or Matplotlib
like you do with Excel. It seems that we need to know more about the structure of the ʻArtist` object generated by the graph.
Recommended Posts