Python / Automatic low wrench unfitting of experimental data

1.First of all

I wrote it for the purpose of fitting a lot of experimental data at once. It's a hassle to open the data one by one, fit it, record the parameters of the result, and so on. Let's take Lawrench Anne as an example, but I think that the flow of file list creation → fitting itself may have other applications.

2. Easy story of Lawrench Anne (skip OK)

Lorentian (Lorentz function) is such a function. $ f(x) = A\frac{d^2}{(x - x_0)^2 + d^2}$ If you draw a graph, it will be a mountain that takes the extremum $ A $ at $ x = x_0 $. $ d $ is the full width at half maximum (HWHM) of the mountain. So the full width at half maximum (FWHM) is $ 2d $. Since the resonance absorption curve of electromagnetic waves can be drawn exactly in this form, it is often used in the fields of physics and engineering.

3. Sample data

I think it is easier to understand if you actually play with the data, so I will create sample data.

`dataset.py`


import numpy as np

#Parameter definition
intensity = 50 #Strength
HWHM = 5 #Half width half width
a = 10 #Large amount of data variation

#Creating a data file
for X0 in range (-30, 31, 10):
    filename = 'X0=' + str(X0) + '.dat'
    with open(filename, 'w') as file:
        file.writelines('X' + '\t' + 'Y' +'\n')
        for X in range (-100,101):
            Y = intensity * HWHM**2 / ((X-X0)**2 + HWHM**2) + a * np.random.rand() - 5
            file.writelines(str(X) + '\t' + str(Y) + '\n')

When executed, 7 dat files will be created in the directory, separated by 10 from X0 = -30 to 30. Numerical data in two columns (X, Y) separated by tabs. X, Y (character string) is written as an index on the first line. The content is the Lorentz function from earlier, Y = $ f ($ X $) $. Y is given a slight variation with random numbers. As an example, plotting "X0 = 0.0dat" looks like the one below.

X0=0.png

From now on, I would like to low-wrench unfit these 7 dat files at once to create a dat file that summarizes the graph of each result and the converged parameters.

4. Bulk unfitting

The main subject is below. Let's fit the created 7 dat files at once and get only the result. I would like to use jupyter so that the procedure is easy to follow. If you write the general flow from here first,

Create a list of files to fit
Fit the files in the list one by one with iteration
Draw a graph and write the converged fitting parameters to a separate file

Will be.

4.1 Make a list of files to fit

Use the glob module to create a list of files to fit, filelist, for processing by the iterator. Here we use the wildcard * to list only the files named X0 = ○○ .dat in the folder. So, I will do it with jupyter,

`In[1]`


import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ptick
import scipy.optimize
import glob, re
from functools import cmp_to_key

`In[2]`


filelist = glob.glob('X0=*.dat')
filelist2 = []

for filename in filelist:
    match = re.match('X0=(.*).dat', filename)
    tuple = (match.group(1), filename)
    filelist2.append(tuple)

def cmp(a, b):
    if a == b: return 0
    return -1 if a < b else 1

def cmptuple(a, b):
    return cmp(float(a[0]), float(b[0]))

filelist = [filename[1] for filename in sorted(filelist2, key=cmp_to_key(cmptuple))]
filelist

`Out[2]`


['X0=-30.dat',
 'X0=-20.dat',
 'X0=-10.dat',
 'X0=0.dat',
 'X0=10.dat',
 'X0=20.dat',
 'X0=30.dat']

Reference: Python: Sort by file name number (External site)

To execute. What ʻIn [2]does is create the list and sort the files in the list. If you just catch it withglob`, it will be sorted by character string, so it will not be sorted in ascending order of numbers. That's not a big problem, but I'm still feeling uncomfortable later, so I'm sorting them in ascending order.

4.2 Initial parameter estimation and fitting

The inside of the loop gets a little longer by all means, but it looks like this. (I will explain later.)

`In[3]`


result = open('fitting_result.dat', 'w')
result.writelines("filename\tA\tA_err\tX0\tX0_err\tHWHM\tHWHM_err\n")

for filename in filelist:
    print(filename)
    match = re.match('(.*).dat', filename)
    name = match.group(1)

    with open(filename, 'r') as file:
        X, Y = [], []

        for line in file.readlines()[1:]:
            items = line[:-1].split('\t')
            X.append(float(items[0]))
            Y.append(float(items[1]))

    #Initial value estimation (intensity and center value)
    max_Y = max(Y)
    min_Y = min(Y)

    if np.abs(max_Y) >= np.abs(min_Y):
        intensity = max_Y
        X0 = X[Y.index(max_Y)]

    else:
        intensity = min_Y
        X0 = X[Y.index(min_Y)]

    #Initial value setting
    pini = np.array([intensity, X0, 1])

    #fitting
    def func(x, intensity, X0, HWHM):
        return intensity * HWHM**2 / ((x-X0)**2 + HWHM**2)
    
    popt, pcov = scipy.optimize.curve_fit(func, X, Y, p0=pini)
    perr = np.sqrt(np.diag(pcov))

    #View results
    print("initial parameter\toptimized parameter")
    for i, v  in enumerate(pini):
        print(str(v)+ '\t' + str(popt[i]) + ' ± ' + str(perr[i]))
    
    #Writing the fitting result to the dat file
    result.writelines(name + '\t' + '\t'.join([str(p) + '\t' + str(e)  for p, e in zip(popt, perr)]) + '\n')
    
    #Create an array for drawing a fitting curve
    fitline = func(X, popt[0], popt[1], popt[2])

    #Result plot and save image
    plt.clf()
    plt.figure(figsize=(10,6))
    plt.plot(X, Y, 'o', alpha=0.5)
    plt.plot(X, fitline, 'r-', linewidth=2)
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.ticklabel_format(style='sci',axis='y',scilimits=(0,0))
    plt.savefig(name + ".png ")
    plt.close()
    
result.close()

`In[3]Result of`


X0=-30.dat
initial parameter	optimized parameter
52.8746388447	52.1363835425 ± 1.37692592137
-29.0	-30.2263312447 ± 0.132482308868
1.0	5.01691953143 ± 0.187432386079
・ ・ ・
(Omitted)
・ ・ ・
X0=30.dat
initial parameter	optimized parameter
50.0825336071	50.5713312616 ± 1.54634448135
31.0	30.1170389209 ± 0.149532401478
1.0	4.89078858649 ± 0.211548017933

I would like to explain each one. First, it is above the commented out # estimate of initial value, but first we prepare a file to write the final result (create a file and write an index). Then, the iteration is started using the list of data files below the for statement.

`python`


match = re.match('(.*).dat', filename)
name = match.group(1)

In the place, the character string before the extension of .dat is extracted with name for the file name when exporting the image at the end.

Next is the # estimate of initial value. In this fitting, ** initial parameters ** (pini) are required to be ** peak intensity ** (ʻintensity), ** X ** (X0) when taking a peak, ** half width. It is half width ** (HWHM`). It is difficult to estimate with what accuracy, but here we estimate each as follows.

In the order of estimation, ʻIntensity: Extracts the maximum and minimum values of Y and adopts the one with the larger absolute value. X0: Adopt the value of X when the adopted ʻintensity appears HWHM: Not estimated. I set it to 1 for the time being. The top two parameters are pretty accurate, so it looks good to compromise here.

And at the # fitting. Fit using the estimated initial parameters. Here, fitting is done according to the fitting procedure of SciPy. Converged parameters are stored in popt. perr is the standard error. It takes the root of the diagonal component of the covariance pcov of the fitting parameters.

reference: -Nonlinear function modeling in Python -Scipy.optimize.curve_fit (SciPy official document)

The last is # display result and # write fitting result to dat file, # create array for fitting curve drawing, # plot result and save image. The # display of result and # writing of fitting result to dat file are omitted as they are. #Create array for fitting curve drawing is the result of substituting X for data into a function defined with converged parameters. Therefore, here, the fitting line data is prepared with the same score as the original data. The # result plot and image save is also omitted as it is.

`python`


plt.savefig(name + ".png ")

In this way, I am using name which is the result of the first re.match.

5. Result

The image of the fitting result looks like this.

X0=0.png

Then, the numerical result (converged fitting parameters and standard error are tab-delimited) is written to fitresult.dat.

6. Conclusion

So, I was able to fit 7 files at once. This time it was seven for simplicity, but if you are doing an experiment, you may have to fit dozens to more than 100 data in some cases, so I think that it will be useful in such cases.

7. Supplement

If the peak is so clear, isn't it possible to fix all the initial parameters appropriately? I think some people think that. However, if there is a deviation in the initial parameters at the order level,

X0=30.png

The reality is that it will be something like this. I don't think it is necessary to estimate the initial value for linear fitting, but if the function becomes a little complicated, it seems better to estimate the initial value to some extent (since it is enough to estimate the order).

You may not be familiar with Jupyter, so I'll put the one in the form of .py at the end.

`lorentzian_fittng.py`


import numpy as np
import matplotlib.pyplot as plt
import scipy.optimize
import glob, re
from functools import cmp_to_key


filelist = glob.glob('X0=*.dat')
filelist2 = []


'''
/////////////////////////////////
Sorting data files
/////////////////////////////////
'''
for filename in filelist:
    match = re.match('X0=(.*).dat', filename)
    tuple = (match.group(1), filename)
    filelist2.append(tuple)

def cmp(a, b):
    if a == b: return 0
    return -1 if a < b else 1

def cmptuple(a, b):
    return cmp(float(a[0]), float(b[0]))

filelist = [filename[1] for filename in sorted(filelist2, key=cmp_to_key(cmptuple))]


'''
/////////////////////////////////
Analysis (low wrench unfitting)
/////////////////////////////////
'''
result = open('fitting_result.dat', 'w')
result.writelines("filename\tA\tA_err\tX0\tX0_err\tHWHM\tHWHM_err\n")

for filename in filelist:
    print(filename)
    match = re.match('(.*).dat', filename)
    name = match.group(1)

    with open(filename, 'r') as file:
        X, Y = [], []

        for line in file.readlines()[1:]:
            items = line[:-1].split('\t')
            X.append(float(items[0]))
            Y.append(float(items[1]))

    #Initial value estimation (intensity and center value)
    max_Y = max(Y)
    min_Y = min(Y)

    if np.abs(max_Y) >= np.abs(min_Y):
        intensity = max_Y
        X0 = X[Y.index(max_Y)]

    else:
        intensity = min_Y
        X0 = X[Y.index(min_Y)]

    #Initial value setting
    pini = np.array([intensity, X0, 1])

    #fitting
    def func(x, intensity, X0, HWHM):
        return intensity * HWHM**2 / ((x-X0)**2 + HWHM**2)

    popt, pcov = scipy.optimize.curve_fit(func, X, Y, p0=pini)
    perr = np.sqrt(np.diag(pcov))

    #View results
    print("initial parameter\toptimized parameter")
    for i, v  in enumerate(pini):
        print(str(v)+ '\t' + str(popt[i]) + ' ± ' + str(perr[i]))

    #Writing the fitting result to the dat file
    result.writelines(name + '\t' + '\t'.join([str(p) + '\t' + str(e)  for p, e in zip(popt, perr)]) + '\n')

    #Create an array for drawing a fitting curve
    fitline = []
    for v in X:
        fitline.append(func(v, popt[0], popt[1], popt[2]))

    #Result plot and save image
    plt.clf()
    plt.figure(figsize=(10,6))
    plt.plot(X, Y, 'o', alpha=0.5)
    plt.plot(X, fitline, 'r-', linewidth=2)
    plt.xlabel('X')
    plt.ylabel('Y')
    plt.ticklabel_format(style='sci',axis='y',scilimits=(0,0))
    plt.savefig(name + ".png ")
    plt.close()

result.close()