[PYTHON] Apply the error propagation formula to the standard error

Apply error propagation formula to standard error

There is something called an error propagation formula. Below is a description of the error propagation formula. Since COV (a, b) is a covariance, it will be zero if the errors of a and b are independent.

y(a,b)Then the error σ of y_{y}Is,Error σ of a_{a}And b error σ_{b}Will be as follows using.

σ_{y}^2 = (\frac{∂y}{∂a})^2 σ_{a}^2 +  (\frac{∂y}{∂b})^2 σ_{b}^2+ 2COV(a,b)\frac{∂y}{∂a}\frac{∂y}{∂b}

This time, I considered whether the error propagation formula can be applied to the standard error.

What is standard error?

Below is a textbook explanation of statistics about what standard error is in the first place.

Suppose you want to estimate the parameters of the population distribution from some sample data. (For example, if the population distribution is normal, the parameters are mean and standard deviation.) Here, we also consider a confidence interval using the standard error derived during parameter estimation.

By the way, I will omit the details about how to derive the standard error, When the parameters are estimated by the maximum likelihood method, the log-likelihood function can be tampered with (differentiate and take the expected value, etc.). You can derive the standard error.

Considering the 95% confidence interval (= estimated parameter +/- 2 x standard error) Repeat the above 95% confidence interval estimate 100 times with new sample data sampled from the population distribution.

After 100 trials, 95 include the true parameter of the population distribution in the 95% confidence interval, and 5 do not. Standard error is used in the above sense. Roughly speaking, it's like an indicator of how reliable the estimated parameters are.

A similar term is standard deviation σ, which refers to the distribution in which the data can actually be combined.

Difference between standard error and standard deviation

Standard error is an indicator of how reliable the data is, and standard deviation is the range of data. Now let's get back to the first title. I often used the error propagation formula for standard deviation (or error), but what I wondered this time was "Is it applicable to standard error?" I thought that there would be no problem if I applied it as it was, but I tried numerical simulation with care.

I tried a numerical simulation

As a flow, (1) Generate the data according to the following and estimate the parameters a, b, and c by the maximum likelihood method. Where ε is normally distributed random noise.

y = a+bx^2+cx^{-2} + ε

(2) After estimating b and c, calculate the parameter d = (c / b) ^ (1/4). (3) Calculate the standard error of d from the standard errors of b and c using the error propagation formula.   This ①②③ is repeated multiple times, and the variation in d calculated in ② is considered to be the standard error of d. If this matches the calculated value in (3), the error propagation formula can be applied to the standard error. Since multiple calculations in (3) are also performed, the average value is used.

Below is the actual python code and output I think that the variation in (2) and the calculation result in (3) match.

main.py


import numpy as np
from numpy.random import *
from scipy.optimize import curve_fit
import statistics
import math


def create_toydata(x_list):
    """
    x_with list as an argument,Y with error_create list
    :param x_list:List of x
    :return y_list:List of y that is a function of x
    :return true_d:True d
    """

    #Parameter true value Number is appropriate
    a = 2e2
    b = 2e-5
    c = 2e7

    y_list = [None] * len(x_list)

    for i, x in enumerate(x_list):
        #Add a normal distribution error to the data. I decided the number 2 appropriately
        noize = 2 * randn()
        y_list[i] = func(x, a, b, c) + noize

    # c,Compute d, which is a function of b
    true_d = (c/b)**(1/4)

    return y_list, true_d


def func(x, a, b, c):
    """Define a function to fit. The function itself was decided appropriately"""
    return a + b * x**2 + c * x**(-2)


def cal_se(C, B, C_se, B_se, COV):
    """
From the error propagation law when there is a correlation(C/B)^(1/4)Find the error of
    :param C:Parameter C
    :param B:Parameter B
    :param C_se:C standard error
    :param B_se:B standard error
    :param COV: C,B covariance
    :return: (C/B)^(1/4)Standard error
    """
    del_c = (1/4) * C**(-3/4) * B**(-1/4)
    del_b = (-1/4) * C**(1/4) * B**(-5/4)

    return math.sqrt((del_c* C_se)**2 + (del_b * B_se)**2 + 2 * COV * del_c * del_b)


if __name__ == '__main__':

    #Number of trials
    n = 1000

    a_list = [None] * len(range(n))
    b_list = [None] * len(range(n))
    c_list = [None] * len(range(n))
    d_list = [None] * len(range(n))
    d_se_list = [None] * len(range(n))

    # cnt:95%The number of true values within the confidence interval
    cnt = 0

    #Repeat the following trial n times
    for i in range(n):
        #From toydata, the maximum likelihood method is "y"= a + x^2 + c^-Coefficient of 2 ”a,b,Find c
        # b,Compute d, which is a function of c.
        # b,Calculate the standard error of d from the standard error of c.

        #Numbers are appropriate
        x_st = 500
        x_end = 2500
        x_num = 200

        x_list = np.linspace(x_st, x_end, x_num).tolist()
        y_list, true_d = create_toydata(x_list=x_list)

        popt, pcov = curve_fit(func, x_list, y_list)

        # #Calculate d
        d = (popt[2]/popt[1])**(1/4)
        d_list[i] = d

        #Standard error of d,b,Calculate from the standard error of c
        d_se = cal_se(popt[2], popt[1], math.sqrt(pcov[2][2]), math.sqrt(pcov[1][1]), pcov[1][2])
        d_se_list[i] = d_se

        # 95%Find the confidence interval
        d_low, d_up = d - 2 * d_se, d + 2 * d_se

        # 95%Find out how many true values are in the confidence interval
        if d_low < true_d and true_d < d_up:
            cnt += 1


    #Find the standard deviation of d tried n times(Consider this as the standard error of d)
    true_d_se = statistics.pstdev(d_list)
    print("Standard error of d:", true_d_se)

    #The standard error of d is calculated for each number of trials, so the final result is averaged.
    cal_d_se = statistics.mean(d_se_list)
    print("Mean of calculated standard errors of d:", cal_d_se)

    # 95%Find out the rate at which the standard error falls within the confidence interval.95%0 from the confidence interval definition.Should be 95
    print("95%Percentage of standard errors within the confidence interval:", cnt/n)

The following is the execution result. The first line is the variation of ② and the second line is the calculation result of ③.

Standard error of d: 2.2655785060979126
Mean of calculated standard errors of d: 2.2443534560700673
95%Percentage of standard errors within the confidence interval: 0.947

Recommended Posts

Apply the error propagation formula to the standard error
Examine the Lie-Trotter formula censoring error
Set the time zone to Japan Standard Time
Let's apply the brand image color to the matplotlib colormap!
Error back propagation method (back propagation)
The road to Pythonista
The road to Djangoist
Check what line caused the error with apply () (dataframe, Pandas)
How to debug the Python standard library in Visual Studio
Inherit the standard library to find the average value of Queue
Change the standard output destination to a file in Python