Numpy：具有积分限制的数值积分

Posted 2023-03-24

技术标签:

【中文标题】Numpy：具有积分限制的数值积分【英文标题】：Numpy: Numerical integration with integration limits 【发布时间】：2021-02-21 21:02:06 【问题描述】：

我已经测量了要在特定范围内积分的峰。

我要整合的数据是带有波数和强度的 numpy 数组形式：

peakQ1_2500_smoothened =
array([[ 1.95594400e+04, -3.70074342e-17,  3.26000000e+00],
       [ 1.95594500e+04,  1.66666667e-03,  4.81500000e+00],
       [ 1.95594600e+04,  2.83333333e-02,  4.80833333e+00],
       [ 1.95594700e+04,  1.33333333e-02,  4.82166667e+00],
       [ 1.95594800e+04,  5.00000000e-03,  4.92416667e+00],
       [ 1.95594900e+04,  5.55555556e-04,  4.99305556e+00],
       [ 1.95595100e+04, -7.77777778e-03,  5.03972222e+00],
       [ 1.95595200e+04, -5.55555556e-03,  4.96888889e+00],
       [ 1.95595300e+04, -1.77777778e-02,  4.91333333e+00],
       [ 1.95595400e+04,  1.38888889e-02,  4.82500000e+00],
       [ 1.95595500e+04,  7.05555556e-02,  4.85722222e+00],
       [ 1.95595600e+04,  1.43888889e-01,  4.86638889e+00],
       [ 1.95595700e+04,  1.98888889e-01,  4.85138889e+00],
       [ 1.95595800e+04,  2.84444444e-01,  4.90694444e+00],
       [ 1.95595900e+04,  4.64444444e-01,  4.93611111e+00],
       [ 1.95596000e+04,  6.61111111e-01,  4.98166667e+00],
       [ 1.95596100e+04,  9.61666667e-01,  4.96722222e+00],
       [ 1.95596200e+04,  1.23222222e+00,  4.94388889e+00],
       [ 1.95596400e+04,  1.43555556e+00,  5.02166667e+00],
       [ 1.95596500e+04,  1.53222222e+00,  5.00500000e+00],
       [ 1.95596600e+04,  1.59833333e+00,  5.03666667e+00],
       [ 1.95596700e+04,  1.66388889e+00,  4.94555556e+00],
       [ 1.95596800e+04,  1.60111111e+00,  4.92777778e+00],
       [ 1.95596900e+04,  1.42333333e+00,  4.94666667e+00],
       [ 1.95597000e+04,  1.14111111e+00,  5.00777778e+00],
       [ 1.95597100e+04,  9.52222222e-01,  5.08555556e+00],
       [ 1.95597200e+04,  7.25555556e-01,  5.09222222e+00],
       [ 1.95597300e+04,  5.80555556e-01,  5.08055556e+00],
       [ 1.95597400e+04,  3.92777778e-01,  5.09611111e+00],
       [ 1.95597500e+04,  2.43222222e-01,  5.01655556e+00],
       [ 1.95597600e+04,  1.36555556e-01,  4.99822222e+00],
       [ 1.95597700e+04,  6.32222222e-02,  4.87044444e+00],
       [ 1.95597800e+04,  3.88888889e-02,  4.91944444e+00],
       [ 1.95597900e+04,  3.22222222e-02,  4.93611111e+00],
       [ 1.95598000e+04,  2.44444444e-02,  5.10277778e+00],
       [ 1.95598100e+04,  5.11111111e-02,  5.11277778e+00],
       [ 1.95598200e+04,  4.44444444e-02,  5.21944444e+00],
       [ 1.95598300e+04,  4.33333333e-02,  5.05333333e+00],
       [ 1.95598400e+04,  3.58333333e-02,  5.08750000e+00],
       [ 1.95598500e+04,  7.50000000e-03,  5.12750000e+00],
       [ 1.95598600e+04,  4.16666667e-03,  5.22916667e+00],
       [ 1.95598800e+04, -1.33333333e-02,  3.51000000e+00]])

我发现我可以对整个数组进行集成：

def integratePeak(yvals, xvals):
    I = np.trapz(yvals, x = xvals)
    return I

但是如何与 x 限制进行集成，例如从 19559.52 到 19559.78？

def integratePeak(yvals, xvals, xlower, xupper):
    '''integrate y over x from xlower to xupper'''
    return I

我当然可以通过明确地将数组元素称为 peakQ1_2500_smoothened[7:33,0] 和 peakQ1_2500_smoothened[7:33,1] 来给出 x 和 y 值，但显然我不想引用数组元素，而是将积分限制定义为波数，因为不同的测量的峰具有不同的阵列长度。

将每个波数减少到一个数据点然后取平均值的函数：

def averagePerWavenumber(data):
    wavenum, intensity, power = data[:,0], data[:,1], data[:,2]
    wavenum_unique, intensity_mean = npi.group_by(wavenum).mean(intensity)
    wavenum_unique, power_mean = npi.group_by(wavenum).mean(power)
    output = np.zeros(shape=(len(wavenum_unique), 3))
    output[:,0] = wavenum_unique
    output[:,1] = intensity_mean
    output[:,2] = power_mean
    return output

def smoothening(data, bins):
    output = np.zeros(shape=(len(data[:,0]), 3))
    output[:,0] = data[:,0]
    output[:,1] = np.convolve(data[:,1], np.ones(bins), mode='same') / bins
    output[:,2] = np.convolve(data[:,2], np.ones(bins), mode='same') / bins
    return output

【问题讨论】：

您能否将数组发布为文本而不是照片？还有，你是怎么平滑的？使用变量peakQ1_2500_smoothened[start:end, 0] 执行peakQ1_2500_smoothened[7:33,0] and peakQ1_2500_smoothened[7:33,1]。 @InyoungKim 关键是我想将积分限制作为波数（我的 x 轴）而不是数组中的数值。因为某个 x 值在数组中并不总是相同的位置。我会认为这是很常见的事情。 @anon01 添加了数组。我创建了一个函数，每个波数只有一个数据点，另一个函数对数据点进行运行平均值。我也添加了这些。虽然这与问题无关：如何使用积分限制进行数值积分。找到你想要整合的点的索引，整合合适的切片。 【参考方案1】：

让我们先看看np.trapz 的实际作用。 ith 梯形的面积是平均高度乘以宽度：0.5 * (y[i + 1] + y[i]) * (x[i + 1] - x[i])。如果您有一个固定的dx 而不是x 数组，则最后一项只是一个标量。所以让我们重写你的第一个函数：

def integrate_peak0(y, x):
    """ x can be array of same size as y or a scalar """
    dx = x if x.size <= 1 else np.diff(x)
    return np.sum(0.5 * (y[1:] + y[:-1]) * dx)

现在最困难的部分是对集成的限制进行插值。由于x 已排序，您可以使用np.searchsorted 将限制转换为索引为数据：

limits = np.array([xlower, xupper])
indices = np.searchsorted(x, limits)

如果限制总是落在x 的精确值上，您可以直接使用indices：

def integrate_peak1(y, x, xlower, xupper):
    indices = np.searchsorted(x, [xlower, xupper])
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s])

由于几乎不会出现这种情况，您可以尝试下一个最简单的方法：四舍五入到最接近的值。您可以使用花哨的索引为每个可能的边界获取一个二维数组，您可以将np.argmin 应用于：

candidates = x[np.stack((indices - 1, indices), axis=0)]
offset = np.abs(candidates - limits).argmin(axis=0) - 1
indices += offset

candidates 是一个 2x2 数组，其中的列代表每个边界的候选者，行代表较小和较大的候选者。 offset 将是您需要修改索引以获得最近邻居的数量。这是根据积分限制选择最近的 bin 的积分器版本：

def integrate_peak2(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    candidates = x[np.stack((indices - 1, indices), axis=0)]
    indices += np.abs(candidates - limits).argmin(axis=0) - 1

    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s])

最终版本是基于x 内插y 的值。此版本可以通过以下两种方式之一实现。您可以计算目标 y 值并使用适当的x 将它们传递给np.trapz，或者您可以使用integrate_peak0 中定义的函数自己进行操作。

给定一个元素x[i] < xn <= x[i + 1]，你可以估计yn = y[i] + (y[i + 1] - y[i]) * (x[n] - x[i]) / (x[i + 1] - x[i])。这里，x[i] 和x[i + 1] 是上面显示的candidates 的值。 y[i] 和y[i + 1] 是y 的对应元素。 xn 是 limits。因此，您可以通过几种不同的方式计算插值。

一种方法是将输入调整为trapz：

def integrate_peak3a(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    indices = np.stack((indices - 1, indices), axis=0)
    xi = x[indices]
    yi = y[indices]
    yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)

    indices = indices[[1, 0], [0, 1]]
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(np.r_[yn[0, 0], y[s], yn[0, 1]], np.r_[xlower, x[s], xupper])

另一种方法是手动计算边缘片段的总和：

def integrate_peak3b(y, x, xlower, xupper):
    limits = np.array([xlower, xupper])
    indices = np.searchsorted(x, limits)
    indices = np.stack((indices - 1, indices), axis=0)
    xi = x[indices]
    yi = y[indices]
    yn = yi[0] + np.diff(yi, axis=0) * (limits - xi[0]) / np.diff(xi, axis=0)

    indices = indices[[1, 0], [0, 1]]
    s = slice(indices[0], indices[1] + 1)
    return np.trapz(y[s], x[s]) - 0.5 * np.diff((yn + y[indices]) * (x[indices] - limits))

当然，您可以通过integrate_peak0 中的“手动”计算运行integrate_peak3a 中的np.trapz 的输入。

在所有这些情况下，检查积分的限制是否在可接受的范围内并以正确的顺序留给读者作为练习。

【讨论】：

感谢您的详尽解释。直接使用索引就足够了，因为可以只使用最接近的值，而且因为我在积分之前使用的是运行平均值，所以总是会有准确的值所以，我真正需要的只是np.searchsorted 或np.argmin()。我尝试在谷歌上搜索“numpy numeric integration with limits”的所有可能的措辞，但奇怪的是没有发现任何有用的东西。我发现要么与特定指标集成，要么与功能分析集成。 @Wulfram。我不确定哪种情况适用于你，所以我给了你所有的选择。一旦您知道那里有哪些工具，这还不错。另外，仅供参考，这里的代码都没有经过测试:)【参考方案2】：

def integratePeak(yvals, xvals, xlower, xupper):
    '''integrate y over x from xlower to xupper.

    Use trapz to integrate over points closest to xlower, xupper.
    
    the +1 to idx_max is for numpy half-open indexing.
    '''
    idx_min = np.argmin(np.abs(xvals - xlower))
    idx_max = np.argmin(np.abs(xvals - xupper)) + 1
    result = np.trapz(yvals[idx_min:idx_max], x=xvals[idx_min:idx_max])
    return result

顺便说一句，您可能会从使用 pandas 处理表格数据中受益——它可以与 numpy 数组很好地互操作，最重要的是可以让您标记数据：

import pandas as pd
df = pd.DataFrame(peakQ1_2500_smoothened, columns=["wave_num", "intensity", "col3"])

integratePeak(yvals=df.intensity, xvals=df.wave_num, xlower=19559.52, xupper=19559.78)

# 0.18853555549577536

【讨论】：

谢谢。我知道 Pandas，但还没有花时间进入它，并且到目前为止坚持使用 numpy。奇怪的是，当我搜索“numpy numeric integration with limits”时，没有一个有用的答案使用np.argmin() 或np.searchsorted()，尽管这是人们最常做的事情之一。 @Wulfram 当然。作为旁注：您的积分值中的错误看起来首先由您的样本数据主导，然后可能是您的平滑技术。在这些改进之前，我不会使集成算法复杂化以提高性能或准确性。

以上是关于Numpy：具有积分限制的数值积分的主要内容，如果未能解决你的问题，请参考以下文章