[PYTHON] Average estimation of capped data

Problem setting

--Measuring a number of times as a sensor --This time is known to follow an exponential distribution --Sensor malfunction may return less than the correct time. However, you can tell if something went wrong. --I want to know the average of the original time from the measured values

a formula

It may be obtained so that the probability of becoming the estimated value is maximized. Specifically, the following equation. However, in the case of a defect, it is considered that the upper limit has been reached.

$ Original average = \ frac {total of all data} {number of data-number of limits reached} $

Check with Python

python3


import numpy as np
np.random.seed(1)
n = 100000 #The number of data
a = np.random.exponential(3, n) #Original distribution
print('Average of the original distribution%.3f'%a.mean())
b = np.random.uniform(2, 10, n) #upper limit
c = np.min((a, b), 0) #Distribution with upper limit
nn = (c==b).sum() #Number reached the upper limit
print('Estimated mean of the original distribution%.3f'%(c.sum()/(n-nn)))
>>>
Mean of original distribution 2.996
Estimated mean of the original distribution 2.996

that's all

Recommended Posts

Average estimation of capped data
Numerical summary of data
Preprocessing of prefecture data
Selection of measurement data
Python: Diagram of 2D data distribution (kernel density estimation)
Visualization of data by prefecture
Fourier transform of raw data
About data management of anvil-app-server
Probability prediction of imbalanced data
Differentiation of time series data (discrete)
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data handling 2 Analysis of various data formats