Python / Automatic low wrench unfitting of experimental data

1.First of all

I wrote it for the purpose of fitting a lot of experimental data at once. It's a hassle to open the data one by one, fit it, record the parameters of the result, and so on. Let's take Lawrench Anne as an example, but I think that the flow of file list creation → fitting itself may have other applications.

2. Easy story of Lawrench Anne (skip OK)

Lorentian (Lorentz function) is such a function. $ f(x) = A\frac{d^2}{(x - x_0)^2 + d^2}$ If you draw a graph, it will be a mountain that takes the extremum $ A $ at $ x = x_0 $. $ d $ is the full width at half maximum (HWHM) of the mountain. So the full width at half maximum (FWHM) is $ 2d $. Since the resonance absorption curve of electromagnetic waves can be drawn exactly in this form, it is often used in the fields of physics and engineering.

3. Sample data

I think it is easier to understand if you actually play with the data, so I will create sample data.

dataset.py


import numpy as np

#Parameter definition
intensity = 50 #Strength
HWHM = 5 #Half width half width
a = 10 #Large amount of data variation

#Creating a data file
for X0 in range (-30, 31, 10):
    filename = 'X0=' + str(X0) + '.dat'
    with open(filename, 'w') as file:
        file.writelines('X' + '\t' + 'Y' +'\n')
        for X in range (-100,101):
            Y = intensity * HWHM**2 / ((X-X0)**2 + HWHM**2) + a * np.random.rand() - 5
            file.writelines(str(X) + '\t' + str(Y) + '\n')

When executed, 7 dat files will be created in the directory, separated by 10 from X0 = -30 to 30. Numerical data in two columns (X, Y) separated by tabs. X, Y (character string) is written as an index on the first line. The content is the Lorentz function from earlier, Y = $ f ($ X $) $. Y is given a slight variation with random numbers. As an example, plotting "X0 = 0.0dat" looks like the one below.

X0=0.png

From now on, I would like to low-wrench unfit these 7 dat files at once to create a dat file that summarizes the graph of each result and the converged parameters.

4. Bulk unfitting

The main subject is below. Let's fit the created 7 dat files at once and get only the result. I would like to use jupyter so that the procedure is easy to follow. If you write the general flow from here first,

  1. Create a list of files to fit
  2. Fit the files in the list one by one with iteration
  3. Draw a graph and write the converged fitting parameters to a separate file

Will be.

4.1 Make a list of files to fit

Use the glob module to create a list of files to fit, filelist, for processing by the iterator. Here we use the wildcard * to list only the files named X0 = ○○ .dat in the folder. So, I will do it with jupyter,

In[1]


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ptick
import scipy.optimize
import glob, re
from functools import cmp_to_key

In[2]


filelist = glob.glob('X0=*.dat')
filelist2 = []

for filename in filelist:
    match = re.match('X0=(.*).dat', filename)
    tuple = (match.group(1), filename)
    filelist2.append(tuple)

def cmp(a, b):
    if a == b: return 0
    return -1 if a < b else 1

def cmptuple(a, b):
    return cmp(float(a[0]), float(b[0]))

filelist = [filename[1] for filename in sorted(filelist2, key=cmp_to_key(cmptuple))]
filelist

Out[2]


['X0=-30.dat',
 'X0=-20.dat',
 'X0=-10.dat',
 'X0=0.dat',
 'X0=10.dat',
 'X0=20.dat',
 'X0=30.dat']

Reference: Python: Sort by file name number (External site)

To execute. What ʻIn [2]does is create the list and sort the files in the list. If you just catch it withglob`, it will be sorted by character string, so it will not be sorted in ascending order of numbers. That's not a big problem, but I'm still feeling uncomfortable later, so I'm sorting them in ascending order.

4.2 Initial parameter estimation and fitting

The inside of the loop gets a little longer by all means, but it looks like this. (I will explain later.)

In[3]


result = open('fitting_result.dat', 'w')
result.writelines("filename\tA\tA_err\tX0\tX0_err\tHWHM\tHWHM_err\n")

for filename in filelist:
    print(filename)
    match = re.match('(.*).dat', filename)
    name = match.group(1)

    with open(filename, 'r') as file:
        X, Y = [], []

        for line in file.readlines()[1:]:
            items = line[:-1].split('\t')
            X.append(float(items[0]))
            Y.append(float(items[1]))

    #Initial value estimation (intensity and center value)
    max_Y = max(Y)
    min_Y = min(Y)

    if np.abs(max_Y) >= np.abs(min_Y):
        intensity = max_Y
        X0 = X[Y.index(max_Y)]

    else:
        intensity = min_Y
        X0 = X[Y.index(min_Y)]

    #Initial value setting
    pini = np.array([intensity, X0, 1])

    #fitting
    def func(x, intensity, X0, HWHM):
        return intensity * HWHM**2 / ((x-X0)**2 + HWHM**2)
    
    popt, pcov = scipy.optimize.curve_fit(func, X, Y, p0=pini)
    perr = np.sqrt(np.diag(pcov))

    #View results
    print("initial parameter\toptimized parameter")
    for i, v  in enumerate(pini):
        print(str(v)+ '\t' + str(popt[i]) + ' ± ' + str(perr[i]))
    
    #Writing the fitting result to the dat file
    result.writelines(name + '\t' + '\t'.join([str(p) + '\t' + str(e)  for p, e in zip(popt, perr)]) + '\n')
    
    #Create an array for drawing a fitting curve
    fitline = func(X, popt[0], popt[1], popt[2])

    #Result plot and save image
    plt.clf()
    plt.figure(figsize=(10,6))
    plt.plot(X, Y, 'o', alpha=0.5)
    plt.plot(X, fitline, 'r-', linewidth=2)
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.ticklabel_format(style='sci',axis='y',scilimits=(0,0))
    plt.savefig(name + ".png ")
    plt.close()
    
result.close()

In[3]Result of


X0=-30.dat
initial parameter	optimized parameter
52.8746388447	52.1363835425 ± 1.37692592137
-29.0	-30.2263312447 ± 0.132482308868
1.0	5.01691953143 ± 0.187432386079
・ ・ ・
(Omitted)
・ ・ ・
X0=30.dat
initial parameter	optimized parameter
50.0825336071	50.5713312616 ± 1.54634448135
31.0	30.1170389209 ± 0.149532401478
1.0	4.89078858649 ± 0.211548017933

I would like to explain each one. First, it is above the commented out # estimate of initial value, but first we prepare a file to write the final result (create a file and write an index). Then, the iteration is started using the list of data files below the for statement.

python


match = re.match('(.*).dat', filename)
name = match.group(1)

In the place, the character string before the extension of .dat is extracted with name for the file name when exporting the image at the end.

Next is the # estimate of initial value. In this fitting, ** initial parameters ** (pini) are required to be ** peak intensity ** (ʻintensity), ** X ** (X0) when taking a peak, ** half width. It is half width ** (HWHM`). It is difficult to estimate with what accuracy, but here we estimate each as follows.

In the order of estimation, ʻIntensity: Extracts the maximum and minimum values of Y and adopts the one with the larger absolute value. X0: Adopt the value of X when the adopted ʻintensity appears HWHM: Not estimated. I set it to 1 for the time being. The top two parameters are pretty accurate, so it looks good to compromise here.

And at the # fitting. Fit using the estimated initial parameters. Here, fitting is done according to the fitting procedure of SciPy. Converged parameters are stored in popt. perr is the standard error. It takes the root of the diagonal component of the covariance pcov of the fitting parameters.

reference: -Nonlinear function modeling in Python -Scipy.optimize.curve_fit (SciPy official document)

The last is # display result and # write fitting result to dat file, # create array for fitting curve drawing, # plot result and save image. The # display of result and # writing of fitting result to dat file are omitted as they are. #Create array for fitting curve drawing is the result of substituting X for data into a function defined with converged parameters. Therefore, here, the fitting line data is prepared with the same score as the original data. The # result plot and image save is also omitted as it is.

python


plt.savefig(name + ".png ")

In this way, I am using name which is the result of the first re.match.

5. Result

The image of the fitting result looks like this.

X0=0.png

Then, the numerical result (converged fitting parameters and standard error are tab-delimited) is written to fitresult.dat.

6. Conclusion

So, I was able to fit 7 files at once. This time it was seven for simplicity, but if you are doing an experiment, you may have to fit dozens to more than 100 data in some cases, so I think that it will be useful in such cases.

7. Supplement

If the peak is so clear, isn't it possible to fix all the initial parameters appropriately? I think some people think that. However, if there is a deviation in the initial parameters at the order level,

X0=30.png

The reality is that it will be something like this. I don't think it is necessary to estimate the initial value for linear fitting, but if the function becomes a little complicated, it seems better to estimate the initial value to some extent (since it is enough to estimate the order).

You may not be familiar with Jupyter, so I'll put the one in the form of .py at the end.

lorentzian_fittng.py


import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
import glob, re
from functools import cmp_to_key


filelist = glob.glob('X0=*.dat')
filelist2 = []


'''
/////////////////////////////////
Sorting data files
/////////////////////////////////
'''
for filename in filelist:
    match = re.match('X0=(.*).dat', filename)
    tuple = (match.group(1), filename)
    filelist2.append(tuple)

def cmp(a, b):
    if a == b: return 0
    return -1 if a < b else 1

def cmptuple(a, b):
    return cmp(float(a[0]), float(b[0]))

filelist = [filename[1] for filename in sorted(filelist2, key=cmp_to_key(cmptuple))]


'''
/////////////////////////////////
Analysis (low wrench unfitting)
/////////////////////////////////
'''
result = open('fitting_result.dat', 'w')
result.writelines("filename\tA\tA_err\tX0\tX0_err\tHWHM\tHWHM_err\n")

for filename in filelist:
    print(filename)
    match = re.match('(.*).dat', filename)
    name = match.group(1)

    with open(filename, 'r') as file:
        X, Y = [], []

        for line in file.readlines()[1:]:
            items = line[:-1].split('\t')
            X.append(float(items[0]))
            Y.append(float(items[1]))

    #Initial value estimation (intensity and center value)
    max_Y = max(Y)
    min_Y = min(Y)

    if np.abs(max_Y) >= np.abs(min_Y):
        intensity = max_Y
        X0 = X[Y.index(max_Y)]

    else:
        intensity = min_Y
        X0 = X[Y.index(min_Y)]

    #Initial value setting
    pini = np.array([intensity, X0, 1])

    #fitting
    def func(x, intensity, X0, HWHM):
        return intensity * HWHM**2 / ((x-X0)**2 + HWHM**2)

    popt, pcov = scipy.optimize.curve_fit(func, X, Y, p0=pini)
    perr = np.sqrt(np.diag(pcov))

    #View results
    print("initial parameter\toptimized parameter")
    for i, v  in enumerate(pini):
        print(str(v)+ '\t' + str(popt[i]) + ' ± ' + str(perr[i]))

    #Writing the fitting result to the dat file
    result.writelines(name + '\t' + '\t'.join([str(p) + '\t' + str(e)  for p, e in zip(popt, perr)]) + '\n')

    #Create an array for drawing a fitting curve
    fitline = []
    for v in X:
        fitline.append(func(v, popt[0], popt[1], popt[2]))

    #Result plot and save image
    plt.clf()
    plt.figure(figsize=(10,6))
    plt.plot(X, Y, 'o', alpha=0.5)
    plt.plot(X, fitline, 'r-', linewidth=2)
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.ticklabel_format(style='sci',axis='y',scilimits=(0,0))
    plt.savefig(name + ".png ")
    plt.close()

result.close()

Recommended Posts

Python / Automatic low wrench unfitting of experimental data
Automatic update of Python module
Automatic acquisition of gene expression level data by python and R
Automatic collection of stock prices using python
Recommendation of Altair! Data visualization with Python
[Introduction to Data Scientists] Basics of Python ♬
The story of low learning costs for Python
Automatic operation of Chrome with Python + Selenium + pandas
Real-time visualization of thermography AMG8833 data in Python
[Python] [chardet] Automatic detection of character code of file
The story of reading HSPICE data in Python
Python introductory study-output of sales data using tuples-
Automatic acquisition of stock price data with docker-compose
Introduction of Python
Data analysis python
Basics of Python ①
Copy of python
[python] Read data
Introduction of Python
Use data class for data storage of Python 3.7 or higher
Summary of tools needed to analyze data in Python
[Python] [Word] [python-docx] Simple analysis of diff data using python
Power BI visualization of Salesforce data entirely in Python
List of Python libraries for data scientists and data engineers
Challenge principal component analysis of text data with Python
Summary of Pandas methods used when extracting data [Python]
Not being aware of the contents of the data in python
List of Python code used in big data analysis
Python: Diagram of 2D data distribution (kernel density estimation)
Let's use the open data of "Mamebus" in Python
Understand the status of data loss --Python vs. R
Extract the band information of raster data with python