如何使用 LMfit 将曲线拟合到双高斯/偏斜高斯

Posted

技术标签:

【中文标题】如何使用 LMfit 将曲线拟合到双高斯/偏斜高斯【英文标题】:How to fit curve to bi-gaussian/skewed gaussian using LMfit 【发布时间】:2020-03-25 16:32:04 【问题描述】:

我有一堆代码可以将质谱峰从光谱中分离出来,并将峰的值放入两个列表中。 Gaussian_x 和 gaussian_x。我现在需要拟合这条曲线,最好使用 Levenberg-Marquardt 或类似的算法,这样我就可以在拟合后计算出峰面积。

对于双高斯峰(在峰的前半部分和后半部分具有不同 sigma 值的峰)并没有太大帮助,所以我正在努力寻找一种方法来让它工作.

yvals = np.asarray(gaussian_y)
model = SkewedGaussianModel()

# set initial parameter values
params = model.make_params(amplitude=a, center=b, sigma=c, gamma=d)

# adjust parameters to best fit data.
result = model.fit(yvals, params, x=xvals)

print(result.fit_report())
plt.plot(xvals, yvals)
plt.plot(xvals, result.init_fit)
plt.plot(xvals, result.best_fit)
plt.show()

当我绘制此图时,图上只有一条直线 y=0,然后是两个列表的图。 我不知道为什么 yvalue 没有向 best_fit 和 init_fit 注册。

这是完整的代码:

import pymzml
import numpy as np
import matplotlib.pyplot as plt
from lmfit.models import SkewedGaussianModel
import filepath

TARGET_MASS = 152
MASS_WIDTH = 1

PEAK_CENTER = 0.9
PEAK_WIDTH = 0.1

gaussian_x = []
gaussian_y = []

a = 1
b = 400
c = 100
d = 280


def main(in_path):

    x_array = []
    y_array = []

    run = pymzml.run.Reader(in_path)
    for spectrum in run:

        if spectrum.ms_level == 1:
            mass_array = []
            intensity_array = []
            target_array = []

            # change tuple into array.
            my_tuple = spectrum.peaks("raw")
            my_array = np.asarray(my_tuple)

            # append the first element of each index to mass array and second element to intensity array.
            for i in my_array:
                mass_array.append(i[0])
                intensity_array.append(i[1])

            # for index and value in mass_array, absolute value of mass - target mass taken.
            # append the abs_mass values to a separate array.
            for i, mass in enumerate(mass_array):
                abs_mass = abs(mass - TARGET_MASS)
                target_array.append(abs_mass)

                # if the absolute mass values are less than selected mass_width, append the value in the intensity_array
                # at that same index same index
                if abs_mass < MASS_WIDTH:
                    y_array.append(intensity_array[i])
                    x_array.append(spectrum.scan_time_in_minutes())

    new_x = []
    new_y = []

    # loop through x values
    for i, x1 in enumerate(x_array):
        # temporary store for where there are multiple y values for the same x value
        y_temp = []

        # ensure we only run through a search for each UNIQUE x value once
        if x1 in new_x:
            # moves onto the next for loop interation without executing the below
            continue

        # add unique x value to new_x
        new_x.append(x1)

        # loop through the x values again, looking for where there are duplicates
        for j, x2 in enumerate(x_array):
            if x1 == x2:
                # where we find a duplicate entry, add the value of y to a temporary array
                y_temp.append(y_array[j])

        # after accumulating all values of y for the same x value,
        # add it to the new_y array
        new_y.append(max(y_temp))

    lower_bound = PEAK_CENTER - (PEAK_WIDTH / 2)
    upper_bound = PEAK_CENTER + (PEAK_WIDTH / 2)

    # for each index and value in retention time array
    for i, x in enumerate(new_x):
        # if x greater than the lower bound and smaller than the upper bound then append x and y value
        # to new lists
        if lower_bound < x < upper_bound:
            gaussian_x.append(x)
            gaussian_y.append(new_y[i])

        # if x greater than upper bound, stop function
        if x > upper_bound:
            break

    xvals = np.asarray(gaussian_x)
    yvals = np.asarray(gaussian_y)
    model = SkewedGaussianModel()

    # set initial parameter values
    params = model.make_params(amplitude=a, center=b, sigma=c, gamma=d)

    # adjust parameters to best fit data.
    result = model.fit(yvals, params, x=xvals)

    print(result.fit_report())
    plt.plot(xvals, yvals)
    plt.plot(xvals, result.init_fit)
    plt.plot(xvals, result.best_fit)
    plt.show()


if __name__ == "__main__":
    main(in_path)

适合报告说:

[[Model]]
    Model(skewed_gaussian)
[[Fit Statistics]]
    # fitting method   = leastsq
    # function evals   = 5
    # data points      = 22
    # variables        = 4
    chi-square         = 1.6213e+10
    reduced chi-square = 9.0074e+08
    Akaike info crit   = 457.197124
    Bayesian info crit = 461.561294
##  Warning: uncertainties could not be estimated:
    amplitude:  at initial value
    center:     at initial value
    sigma:      at initial value
    gamma:      at initial value
[[Variables]]
    amplitude:  1.00000000 (init = 1)
    center:     400.000000 (init = 400)
    sigma:      100.000000 (init = 100)
    gamma:      280.000000 (init = 280)
    height:     0.00398942 == '0.3989423*amplitude/max(2.220446049250313e-16, sigma)'
    fwhm:       235.482000 == '2.3548200*sigma'

【问题讨论】:

请发布一个完整的例子。当然,ab 等的值很重要——你的初始值是否合理?另外:合身报告说了什么? 示例不运行。它有一个非标准的阅读器和太多的代码解析和选择数据(或其他东西)。拟合结果表明您的起始值(独立于数据选择)距离太远,以至于拟合无法细化这些值。如果没有 mvce (***.com/help/asking),我们无法知道这是为什么。 【参考方案1】:

看起来你的初始参数不够好。请参阅以下讨论以更好地猜测您的参数; https://***.com/a/19207683/13131172

【讨论】:

我可以使用其他方法来拟合双高斯曲线吗?双高斯曲线似乎根本没有很多解决方案。

以上是关于如何使用 LMfit 将曲线拟合到双高斯/偏斜高斯的主要内容,如果未能解决你的问题,请参考以下文章

高斯曲线拟合

这是拟合从python中的高斯分布生成的数据的正确方法吗?

python 拟合曲线并求参

lmfit 模型拟合然后预测

将数据拟合到高斯轮廓

在 seaborn displot/histplot 函数中绘制高斯拟合直方图(不是 distplot)