[PYTHON] Investigating the relationship between ice cream spending and temperature

Data analysis Beginners studied pandas by examining the relationship between ice cream spending per household and temperature. (I can see the result somehow ...) The reference was "[Introduction to data analysis by Python](https://www.amazon.co.jp/Python%E3%81%AB%E3%82%88%E3%82%8B%E3%83%" 87% E3% 83% BC% E3% 82% BF% E8% A7% A3% E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E5% B1% B1% E5% 86% 85 -% E9% 95% B7% E6% 89% BF / dp / 4274222888 / ref = sr_1_3? __ mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords = Python + % E3% 83% 87% E3% 83% BC% E3% 82% BF% E8% A7% A3% E6% 9E% 90 & qid = 1583399806 & sr = 8-3) "and the following two sites.

Ice cream BIZ statistics JMA Monthly Average Temperature

I used the 2018 data posted on the above site.

Extract data by web scraping

import pandas as pd
import math
import matplotlib.pyplot as plt
import statsmodels.api as sm
ice_url = 'http://www.icecream.or.jp/biz/data/expenditures.html'
temp_url = 'http://www.data.jma.go.jp/obd/stats/etrn/view/monthly_s3.php?%20prec_no=44&block_no=47662'
ice = pd.read_html(ice_url)[0] 
temp = pd.read_html(temp_url)[0]
# [0]Specifies the first table in
ice_2018 = ice.iloc[1:13, 5].astype(float)
temp_2018 = temp.iloc[144, 1:13].astype(float)
#Extract only 2018 data and convert to numeric type
month = pd.DataFrame([i for i in range(13)])
#Have n months ready

Now we have the 2018 ice cream spending per household and the average monthly temperature.

Find the correlation

icecream = pd.concat([month, ice_2018, temp_2018], axis=1)[1:]
#Combine n months with ice spending and average temperature
x_data, y_data = icecream[144], icecream[5]
avetem, aveex = x_data.sum() / 12, y_data.sum() / 12
#Annual average temperature(avetem)And ice cream spending(aveex)Was asked

The correlation coefficient is expressed as follows using the covariance $ S_ {xy} $ and so on.

r = \frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}

According to this formula, the correlation coefficient $ r $ was honestly calculated as follows. (I think there is a better way)

for i in range(len(icecream)):
    ex = icecream.iloc[i,1] - aveex
    tem = icecream.iloc[i,2] - avetem
    extem += ex*tem
    ex0 += ex**2
    tem0 += tem**2
extem0 = math.sqrt(ex0)*math.sqrt(tem0)
r = extem / extem0
# r = 0.8955143151163499

The results show that there is a strong correlation between ice cream spending and temperature.

regression analysis

Finally, let's perform regression analysis using pandas ols.

X = sm.add_constant(x_data) #It seems necessary to find the intercept
model = sm.OLS(y_data, X)
results = model.fit()
a, b = results.params[0], results.params[1]
# a:Intercept, b:Tilt
plt.plot(x_data, a+b*x_data)
plt.scatter(icecream[144], icecream[5])

ice_temp.png

As a result, a regression line as shown in the figure was obtained.

I see, it seems that you want to eat ice cream when it gets hot.

Recommended Posts

Investigating the relationship between ice cream spending and temperature
The subtle relationship between Gentoo and pip
About the relationship between Git and GitHub
Investigate the relationship between TensorFlow and Keras in transition
[Statistics] Let's visualize the relationship between the normal distribution and the chi-square distribution.
Examine the relationship between two variables (2)
Examine the relationship between two variables (1)
I investigated the relationship between Keras stateful LSTM and hidden state
Relationship between netfilter, firewalld, iptables and nftables
Let's visualize the relationship between average salary and industry with XBRL data and seaborn! (7/10)
What is the difference between `pip` and` conda`?
Relationship between Firestore and Go data type conversion
Summary of the differences between PHP and Python
The answer of "1/2" is different between python2 and 3
About the difference between "==" and "is" in python
About the difference between PostgreSQL su and sudo
What is the difference between Unix and Linux?
Investigation of the relationship between speech preprocessing and transcription accuracy in the Google Cloud Speech API