Story of power approximation by Python

The story begins by drawing a power approximation curve in EXCEL

There was a certain decaying time series data, and when I tried to draw an approximation curve with EXCEL, the power approximation seemed to be good, so I decided to reproduce it with Python instead of EXCEL for business reasons. As a result of trying it, it doesn't fit for some reason ... When I wondered why, I arrived at the following information. Exponentiation by Python Most of the problems can be solved by looking at this site, but I will write an article for my memorandum.

Difference between power approximation by curve_fit of scipy and power approximation by EXCEL

The power expression is defined as follows.   y = bx^{a} scipy's curve_fit performs non-linear regression that returns $ a $ and $ b $ that are the closest approximations to the data in the above equation.

So what about EXCEL? The above equation can be converted as follows.   y = bx^{a} $ \ Rightarrow \ ln y = \ ln (bx ^ {a}) ・ ・ ・ (take a logarithm) $ $ \ Rightarrow \ ln y = a \ ln x + \ ln b ・ ・ ・ (decompose the right side) $ $ \ Rightarrow Y = aX + B ・ ・ ・ (combines the logarithmic part as a new variable) $ By taking the logarithm of the power expression in this way, a linear regression of $ Y = aX + B $ can be performed. In other words, it seems that the exponentiation approximation of EXCEL outputs the result of ** logarithmic conversion and linear regression **.

Try it out in Python

Let's see what actually happens in Python. For the data, we will use the data on changes in the number of infant mortality and mortality rate in the "2011 White Paper on Children and Youth" on the page of the Cabinet Office. "2011 White Paper on Children and Youth" Changes in Infant Deaths and Mortality

First read the data

read_data.py


import numpy as np
from scipy.optimize import curve_fit
import matplotlib.pyplot as plt
import pandas as pd

df=pd.read_csv('Infant mortality rate.csv',encoding='cp932') #Encoding with garbled characters prevention='cp932'
display(df)

image.png

Molded because it has a strange structure

adjust_data.py


df=df.iloc[3:18,:].rename(columns={'1st-1-Figure 6 Changes in infant mortality and mortality rate':'Annual'\
                                 ,'Unnamed: 1':'Infant mortality (persons)'\
                                 ,'Unnamed: 2':'Infant mortality (thousands)'\
                                 ,'Unnamed: 3':'Infant mortality rate'})
#Create serial number columns for later processing
rank=range(1,len(df)+1)
df['rank']=rank
#Infant mortality rate is float type because all columns are object type
df['Infant mortality rate']=df['Infant mortality rate'].astype(float)
df['Infant mortality (persons)']=df['Infant mortality (persons)'].str.replace(',','').astype(np.int)
display(df)

image.png

Infant mortality plot

plot.py


x=df['Annual']
y=df['Infant mortality rate']
ax=plt.subplot(1,1,1)
ax.plot(x,y)
ax.set_xlabel('Annual')
ax.set_ylabel('Infant mortality rate')
plt.show()

image.png The baby mortality rate is much lower than it was 60 years ago. Medical progress is amazing. Now it's time to find the approximate parameters.

Nonlinear regression with scipy curve_fit

func.py


def exp_func(x, a, b):
    return b*(x**a)

def exp_fit(val1_quan, val2_quan):
    #maxfev: maximum number of function calls, check_finite: If True, ValueError occurs if NaN is included.
    l_popt, l_pcov = curve_fit(exp_func, val1_quan, val2_quan, maxfev=10000, check_finite=False)
    return exp_func(val1_quan, *l_popt),l_popt

Find the parameters $ a $ and $ b $ of exp_func using exp_fit.

culc_params.py


x=df['Annual']
x2=df['rank']
y=df['Infant mortality rate']
y_fit,l_popt=exp_fit(x2,y)

ax=plt.subplot(1,1,1)
ax.plot(x,y,label='obs')
ax.plot(x,y_fit,label='model')
ax.set_xlabel('Annual')
ax.set_ylabel('Infant mortality rate')
plt.legend()
plt.show()
print('a : {},   b : {}'.format(l_popt[0],l_popt[1]))#Obtained parameter a,Check b

image.png

Good feeling.

Reproduction of EXCEL exponentiation approximation (logarithmic transformation and linear regression)

func2.py


def exp_func_log(x, a, b):
    return a*np.log(x) + np.log(b)

def exp_func_log_fit(val1_quan, val2_quan):
    l_popt, l_pcov = curve_fit(exp_func_log, val1_quan, np.log(val2_quan), maxfev=10000, check_finite=False)
    return exp_func_log(val1_quan, *l_popt),l_popt

def log_to_exp(x,a,b):
    return np.exp(a*np.log(x) + np.log(b))

Find the parameters $ a $ and $ b $ of exp_func_log using exp_func_log_fit. Since $ Y $ approximated using the obtained parameters $ a $ and $ b $ is $ \ ln y $, it is converted back from the logarithm by log_to_exp.

culc_params2.py


x=df['Annual']
x2=df['rank']
y=df['Infant mortality rate']
y_fit,l_popt=exp_func_log_fit(x2,y)
y_fit=log_to_exp(x2,l_popt[0],l_popt[1])

ax=plt.subplot(1,1,1)
ax.plot(x,y,label='obs')
ax.plot(x,y_fit,label='model')
ax.set_xlabel('Annual')
ax.set_ylabel('Infant mortality rate')
plt.legend()
plt.show()
print('a : {},   b : {}'.format(l_popt[0],l_popt[1])) #Obtained parameter a,Check b

image.png It feels good, but I think it was better to do a direct non-linear regression.

Summary

I'm not sure which one is right for me. However, if you get into a situation where "EXCEL can do it, but Python doesn't do the same!", You may remember this.

Postscript (immediately after writing on 2020/02/28)

When the numerical values of the data are large and fluctuating, the approximation by non-linear regression is likely to be pulled by the large numerical fluctuations. From the point of view of generalization ability, it may be better to perform logarithmic transformation and linear regression.

Insert a dummy number into the number of infant mortality (persons)

dummydata.py


df=pd.read_csv('Infant mortality rate.csv',encoding='cp932')
df=df.iloc[3:18,:].rename(columns={'1st-1-Figure 6 Changes in infant mortality and mortality rate':'Annual'\
                                 ,'Unnamed: 1':'Infant mortality (persons)'\
                                 ,'Unnamed: 2':'Infant mortality (thousands)'\
                                 ,'Unnamed: 3':'Infant mortality rate'})
#Create serial number columns for later processing
rank=range(1,len(df)+1)
df['rank']=rank
#Infant mortality rate is float type because all columns are object type
df['Infant mortality rate']=df['Infant mortality rate'].astype(float)
df['Infant mortality (persons)']=df['Infant mortality (persons)'].str.replace(',','').astype(np.int)

#Insert dummy data
df2=df.copy()
df2.loc[df2['Annual']=='Heisei 2', 'Infant mortality (persons)']=60000
df2.loc[df2['Annual']=='13', 'Infant mortality (persons)']=40000
df2.loc[df2['Annual']=='15', 'Infant mortality (persons)']=20000
df2.loc[df2['Annual']=='18', 'Infant mortality (persons)']=10000
display(df2)

x=df2['Annual']
y=df2['Infant mortality (persons)']
ax=plt.subplot(1,1,1)
ax.plot(x,y)
ax.set_xlabel('Annual')
ax.set_ylabel('Infant mortality (persons)')
ax.set_title('Dummy numbers in 1990,13,15,Insert in 18')
plt.show()

image.png image.png

Draw an approximate curve using the function used in this volume again

dummydata.py


#Nonlinear regression
x=df2['Annual']
x2=df2['rank']
y=df2['Infant mortality (persons)']
y_fit,l_popt=exp_fit(x2,y)

ax=plt.subplot(1,1,1)
ax.plot(x,y,label='obs')
ax.plot(x,y_fit,label='model')
ax.set_xlabel('Annual')
ax.set_ylabel('Infant mortality (persons)')
plt.legend()
plt.show()
print('a : {},   b : {}'.format(l_popt[0],l_popt[1]))

#Logarithmic transformation linear regression
x=df2['Annual']
x2=df2['rank']
y=df2['Infant mortality (persons)']
y_fit,l_popt=exp_func_log_fit(x2,y)
y_fit=log_to_exp(x2,l_popt[0],l_popt[1])

ax=plt.subplot(1,1,1)
ax.plot(x,y,label='obs')
ax.plot(x,y_fit,label='model')
ax.set_xlabel('Annual')
ax.set_ylabel('Infant mortality (persons)')
plt.legend()
plt.show()
print('a : {},   b : {}'.format(l_popt[0],l_popt[1]))

Approximate with non-linear regression image.png Approximate with logarithmic transformation linear regression image.png

Obviously, the one approximated by non-linear regression is pulled by the fluctuation of the numerical value entered by the dummy. It seems important to distinguish and use properly depending on the situation.

Recommended Posts

Story of power approximation by Python
The Power of Pandas: Python
The story of Python and the story of NaN
Expansion by argument of python dictionary
Behavior of python3 by Sakura's server
Investigation of DC power supplies that can be controlled by Python
Explanation of production optimization model by Python
Low-rank approximation of images by HOSVD step by step
Low-rank approximation of images by Tucker decomposition
[Learning memo] Basics of class by python
The story of making Python an exe
Conditional branching of Python learned by chemoinformatics
Grayscale by matrix-Reinventor of Python image processing-
The story of manipulating python global variables
Pandas of the beginner, by the beginner, for the beginner [Python]
Analysis of X-ray microtomography image by Python
The story of blackjack A processing (python)
Basics of Python ①
Basics of python ①
Copy of python
Introduction of Python
Low-rank approximation of images by singular value decomposition
Low-rank approximation of images by HOSVD and HOOI
Image processing? The story of starting Python for
The story of reading HSPICE data in Python
Execute Power Query by passing arguments to Python
Basic story of inheritance in Python (for beginners)
A story made by a person who has no knowledge of Python or Json
Primality test by Python
[Python] Operation of enumerate
List of python modules
[Language processing 100 knocks 2020] Summary of answer examples by Python
Image processing by matrix Basics & Table of Contents-Reinventor of Python image processing-
The story of Python without increment and decrement operators.
Power BI visualization of Salesforce data entirely in Python
Visualization memo by Python
Communication processing by Python
Unification of Python environment
Copy of python preferences
The story of Hash Sum mismatch caused by gcrypto20
Summary of Python articles by pharmaceutical company researcher Yukiya
Basics of Python scraping basics
[python] behavior of argmax
python small story collection
Usage of Python locals ()
the zen of Python
The story of FileNotFound in Python open () mode ='w'
Installation of Python 3.3 rc1
The story of sys.path.append ()
# 4 [python] Basics of functions
Beamformer response by python
Basic knowledge of Python
Sober trivia of python3
Summary of Python arguments
Group by consecutive elements of a list in Python
Basics of python: Output
The story of automatic language conversion of TypeScript / JavaScript / Python
Installation of matplotlib (Python 3.3.2)
Application of Python 3 vars
Memo of "Cython-Speeding up Python by fusing with C"
Various processing of Python