[PYTHON] Selection of measurement data

what is this

You are a manufacturer's inspection engineer. I have 100 measurement data obtained from a certain sensor. Due to various circumstances, I would like to show that this measurement data ** "may vary" **. I want to maximize the variance by choosing 10 out of 100. However, I want to say that the sensor is normal, so I want to make it ** "mean value is accurate" **.

Try it with Python

Measurement data creation

Create measurement data with random numbers.

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np, pandas as pd
np.random.seed(1)
measurement data= np.random.normal(50,1,100)
plt.hist(measurement data)
print('standard deviation',measurement data.std())
>>>
Standard deviation 0.885156213832

image

Solve with mathematical optimization

Maximize the variance. Normally, it is difficult to solve because it is a quadratic integer optimization. Assuming the mean is accurate, $ (value-mean) ^ 2 $ is a fixed value, so the model is linear or mixed integer optimization.

from pulp import *
from ortoolpy import addbinvars
Number of selections= 10
eps = 0.0001

m = LpProblem(sense=LpMaximize)
x = addbinvars(len(measurement data))
m += lpDot((measurement data-50)**2, x)
m += lpSum(x) ==Number of selections
e = lpDot(measurement data, x) /Number of selections
m += 50-eps <= e
m +=           e <= 50+eps
m.solve()
%time m.solve() #Solution
r = np.vectorize(value)(x).astype(int) #result
print(LpStatus[m.status])
>>>
Wall time: 181 ms
Optimal
print('average',measurement data[r>0].mean())
print('standard deviation',measurement data[r>0].std())
>>>
Average 49.9999119632
Standard deviation 1.82811635001

The mean was accurate and the standard deviation was more than double the original data.

that's all

Recommended Posts

Selection of measurement data
Numerical summary of data
Preprocessing of prefecture data
Measurement of execution time
Analysis of measurement data ①-Memorandum of understanding for scipy fitting-
Analysis of measurement data ②-Histogram and fitting, lmfit recommendation-
Visualization of data by prefecture
Fourier transform of raw data
About data management of anvil-app-server
Probability prediction of imbalanced data
Differentiation of time series data (discrete)
10 selections of data extraction by pandas.DataFrame.query
Animation of geographic data by geopandas
Recommendation of data analysis using MessagePack
Time series analysis 3 Preprocessing of time series data
Data handling 2 Analysis of various data formats
PyOpenGL GUI selection and separation of drawing and GUI
Explain the mechanism of PEP557 data class
Basics of Quantum Information Theory: Data Compression (1)
100 Language Processing Knock-91: Preparation of Analogy Data
The story of verifying the open data of COVID-19
Get the column list & data list of CASTable
Acquisition of plant growth data Acquisition of data from sensors
Separation of design and data in matplotlib
Conversion of time data in 25 o'clock notation
Recommendation of Altair! Data visualization with Python
Visualize the export data of Piyo log
Example of efficient data processing with PANDAS
Basics of Quantum Information Theory: Data Compression (2)
[Introduction to Data Scientists] Basics of Python ♬
Awareness of using Aurora Severless Data API
A memorandum of trouble when formatting data