[PYTHON] Analysis of measurement data ②-Histogram and fitting, lmfit recommendation-

Preface

One day meeting, Kyo "T-kun, can't you fit this graph a little better?" "Can't we change the fit range or weight it?" T "... I'll try"

He wants to modify the fitting. It is often required to change the analysis procedure on a whim, and it is difficult to deal with it. I want to be able to do it on the spot.

This time, instead of scipy.optimize.curve_fit, I will introduce a new library called lmfit. Fitting using scipy was performed in Analysis of measurement data ①-Memorandum of understanding for scipy fitting-. Coding is done on Jupyter & Jupyter recommended.

I hope I can introduce lmfit well.

GithHub also has a notebook and sample data. From here

About lmfit

lmfit is developed as upward compatible with scipy.optimize. You can constrain parameters and weight errors when fitting. It seems that it works like adding various information to the object and finally executing the fit command to optimize it. I think it's more convenient than scipy once you get used to it.

This experiment

Pulse data was obtained from the measuring instrument. First, make a histogram with the peak value of the pulse. After that, fitting is performed and the sharpness of the spectrum is evaluated. In addition, we will try to improve the sharpness by using the lmfit weighting option.

Data reading

Read the pulse data swept from the measuring instrument with pandas. Again, use tkinter to connect the file paths.

.ipynb


import tkinter 
from tkinter import filedialog
import numpy as np
import pandas as pd

root = tkinter.Tk()
root.withdraw()
file = filedialog.askopenfilename(
title='file path_',
filetypes=[("csv","csv")])
if file:
    pulse = pd.read_csv(file,engine='python',header=0,index_col=0)

pulse.iloc[:,[0,1,2]].plot()

image.png Let's plot only 3 data. It was swept out so that the maximum peak value could be obtained near the 50th point.

Histogram creation

Create a histogram with the maximum peak value. When creating a histogram, you need to determine the step size on the horizontal axis. This time, according to the Sturges formula, the number of bins is set to: k = 1 + Log2 (n).

* Np.histogram returns ** width ** as many as the frequency (count).

.ipynb


bins = int(1+np.log2(pulse.shape[1]))#Sturges official
count, level = np.histogram(pulse.max(),bins=bins)
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
#Align frequency and size
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
import matplotlib.pyplot as plt
plt.plot(level,count,marker='.',linestyle='None')
plt.show()

image.png

There was a little mountain on the right side. Let's separate the data after 50 and make a distribution again for the mountain on the right side.

.ipynb


mask = pulse.max()>50#Create an array of bool values
pulse = pulse.loc[:,mask]#Masking pulse data

bins = int(1+np.log2(pulse.shape[1]))#Sturges official
count, level = np.histogram(pulse.max(),bins=bins)
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
plt.plot(level,count,marker='o',linestyle='None')
plt.show()

image.png Something like a normal distribution has appeared.

Fitting with lmfit

As you can see from import ~, lmfit uses some class objects for fitting. The fitting function is Gaussian and is as follows. The center of the peak is given by $ \ mu $, and the variation in distribution is given by $ \ sigma . amp is the amplitude term. $f_{\left(x\right)}=\frac{amp}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{\left(x-\mu \right)^2}{2\sigma^2}\right)$$ First, give a function that fits the Model class. Next, create a Parameters class and give it an initial value. The Model class has a .make_params () method, which creates a Parameters object without permission after the second argument of the given function.

.ipynb


from lmfit import Model, Parameters, Parameter
def gaussian(x,amp,mu,sigma):
    out = amp / (sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 /(2 * sigma**2))
    return out
model = Model(gaussian)#Substitute the fitting function and create a model object
pars = model.make_params()#Parameters object generation
pars

image.png The created Parameters object is a set of Parameter objects. Each parameter can be accessed like a dictionary. From there with the .set method ・ Initial value value ・ Upper and lower limits of parameters, max, min: -np.inf ~ np.inf -Whether to change the value before and after fitting (constraint) vary = True or False You can play with each of them.

.ipynb


pars['mu'].set(value=70)
pars['amp'].set(value=1)
pars['sigma'].set(value=1)
pars

image.png It's very rough, but I gave the initial value. The parameters can be set in detail and it is displayed in a DataFrame-like table. It's cool ...

Perform fitting

.ipynb


result = model.fit(data=count,params=pars,x=level)
result

image.png The result is displayed like this. It seems that a new ModelResult object is created when .fit is executed. The generated object had a .plot_fit () method. Visualize with this function. image.png

You can retrieve the optimized parameters in a dictionary by using the .best_values method.

.ipynb


opt = result.best_values
opt['mu']/opt['sigma']

The sharpness was evaluated by $ \ frac {\ mu} {\ sigma} $. It was 23.67.

I tried changing the step size on the horizontal axis

・ Scott's official $h = \frac{3.49\sigma}{ n^\frac{1}{3} }$ I tried to define the step size with. Since it's a big deal, I will use the one I found in the fitting for $ \ sigma $.

.ipynb


h = 3.49 * opt['sigma'] / (pulse.shape[1] **(1/3))#Scott's official
r = pulse.max().max()-pulse.max().min()
bins = int(r/h)

count, level = np.histogram(pulse.max(),bins=bins)
level = np.array(list(map(lambda x:(level[x]+level[x+1])/2,range(level.size-1))))
plt.plot(level,count,marker='o',linestyle='None')
plt.show()

image.png

The step size has become narrower, and the maximum value of the fraction has become smaller, but the shape of the distribution seems to be okay. Fit by exchanging only the values of data and x to be substituted. image.png Sharpness is ... 24.44. Improved by 0.77.

I tried to weight the error

This is the case from here.

He wanted to strengthen the range of $ \ mu \ pm \ sigma $ and fit weights less than 63 to 0. It seems that weighting can be done by specifying the weights option in .fit.

.ipynb


s = (level>opt['mu']-opt['sigma']) * (level<opt['mu']+opt['sigma'])
weight = (np.ones(level.size) + 4*s)*(level>63)
result3 = model.fit(data=count,params=pars,x=level,weights=weight)

↓ Weight & result image.png image.png It doesn't look much different, but if you look closely, it feels like it's being pulled down a little. The sharpness is ... 25.50. Only 0.06 became sharper.

Post-fit points can be called as an array type using the .best_fit method. It was convenient when making csv. Safely solve the problem ...

Summary

This time, the parameters were automatically generated with Model.make_params (), but there is also a method to directly generate an empty class with Parameters () and add variables to it with the .add method.

I was confused at first because I used multiple objects for fitting. It was easy to get confused about where and which method corresponds. However, as you get used to it, you will appreciate Parameter (s).

The fact that there were few sites that explained lmfit in Japanese was also the reason I started writing this article. Is there a better library ...? I would appreciate it if you could teach me. In the future, juniors will not be in trouble. It was the first time I wrote the code while reading the official reference (laughs). There is no choice but to do it because of the research results.

In the future, I would like to learn a little more technology that seems to be connected to my life, and I also want to be able to utilize the technology I have in various fields. I want to be creative.

That's all for python-related progress. I couldn't help reporting it to the professor (no one is doing python), so I would like to take this opportunity to report it. I am deeply grateful to the professor for waiting patiently for my data analysis.

Recommended Posts

Analysis of measurement data ②-Histogram and fitting, lmfit recommendation-
Analysis of measurement data ①-Memorandum of understanding for scipy fitting-
Recommendation of data analysis using MessagePack
Selection of measurement data
Analysis of financial data by pandas and its visualization (2)
Analysis of financial data by pandas and its visualization (1)
Story of image analysis of PDF file and data extraction
"Measurement Time Series Analysis of Economic and Finance Data" Solving Chapter End Problems with Python
Practice of data analysis by Python and pandas (Tokyo COVID-19 data edition)
Clash of Clans and image analysis (3)
Time series analysis 3 Preprocessing of time series data
Data handling 2 Analysis of various data formats
Summary of probability distributions that often appear in statistics and data analysis
Starbucks Twitter Data Location Visualization and Analysis
Separation of design and data in matplotlib
Recommendation of Altair! Data visualization with Python
[Python] From morphological analysis of CSV data to CSV output and graph display [GiNZA]
Practice of creating a data analysis platform with BigQuery and Cloud DataFlow (data processing)
Smoothing of time series and waveform data 3 methods (smoothing)
Data analysis planning collection processing and judgment (Part 1)
Sentiment analysis of large-scale tweet data by NLTK
Data cleansing 3 Use of OpenCV and preprocessing of image data
I tried morphological analysis and vectorization of words
A well-prepared record of data analysis in Python
Data analysis planning collection processing and judgment (Part 2)
[Python beginner's memo] Importance and method of confirming missing value NaN before data analysis