使用 numpy/scipy 识别数字信号的斜率变化？

Posted 2023-03-12

技术标签:

【中文标题】使用 numpy/scipy 识别数字信号的斜率变化？【英文标题】：Using numpy/scipy to identify slope changes in digital signals? 【发布时间】：2018-05-11 04:55:00 【问题描述】：

我正在尝试在 Python 中提出一种通用方法来识别在一组计划的航天器机动过程中发生的俯仰旋转。您可以将其视为shift detection 问题的特例。

让我们考虑一下我的一组测量值中的solar_elevation_angle 变量，它确定了从航天器仪器测量的太阳仰角。对于那些可能想要玩数据的人，我保存了solar_elevation_angle.txt 文件here。

import numpy as np
import matplotlib.pyplot as plt
from matplotlib import gridspec
from scipy.signal import argrelmax
from scipy.ndimage.filters import gaussian_filter1d

solar_elevation_angle = np.loadtxt("solar_elevation_angle.txt", dtype=np.float32)

fig, ax = plt.subplots()    
ax.set_title('Solar elevation angle')
ax.set_xlabel('Scanline')
ax.set_ylabel('Solar elevation angle [deg]')
ax.plot(solar_elevation_angle)
plt.show()

扫描线是我的时间维度。斜率变化的四个点确定了航天器的俯仰旋转。

如您所见，航天器机动区域之外的太阳仰角演变几乎是线性的，作为时间的函数，对于这个特定的航天器来说应该始终如此（重大故障除外）。

请注意，在每次航天器操纵期间，坡度变化显然是连续的，尽管在我的角度值集中是离散的。这意味着：对于每个机动，尝试定位发生机动的单个扫描线是没有意义的。我的目标是为每个操作确定一个“代表性”扫描线，该扫描线在定义操作发生的时间间隔的扫描线范围内（例如，中间值或左边界）。

一旦我得到一组“有代表性的”扫描线索引，其中所有动作都发生了，我就可以使用这些索引粗略估计动作持续时间，或者在图上自动放置标签。

到目前为止，我的解决方案是：

np.gradient

这是我的代码：

fig = plt.figure(figsize=(8,12))
gs = gridspec.GridSpec(5, 1) 

ax0 = plt.subplot(gs[0])
ax0.set_title('Solar elevation angle')
ax0.plot(solar_elevation_angle)

solar_elevation_angle_1stdev = np.gradient(solar_elevation_angle)
ax1 = plt.subplot(gs[1])
ax1.set_title('1st derivative')
ax1.plot(solar_elevation_angle_1stdev)

solar_elevation_angle_2nddev = np.gradient(solar_elevation_angle_1stdev)
ax2 = plt.subplot(gs[2])
ax2.set_title('2nd derivative')
ax2.plot(solar_elevation_angle_2nddev)

solar_elevation_angle_2nddev_clipped = np.clip(np.abs(np.gradient(solar_elevation_angle_2nddev)), 0.0001, 2)
ax3 = plt.subplot(gs[3])
ax3.set_title('absolute value + clipping')
ax3.plot(solar_elevation_angle_2nddev_clipped)

smoothed_signal = gaussian_filter1d(solar_elevation_angle_2nddev_clipped, 20)
ax4 = plt.subplot(gs[4])
ax4.set_title('Smoothing applied')
ax4.plot(smoothed_signal)

plt.tight_layout()
plt.show()

然后我可以使用 scipy 的 argrelmax 函数轻松识别局部最大值：

max_idx = argrelmax(smoothed_signal)[0]
print(max_idx)
# [ 689 1019 2356 2685]

正确识别我正在寻找的扫描线索引：

fig, ax = plt.subplots()
ax.set_title('Solar elevation angle')
ax.set_xlabel('Scanline')
ax.set_ylabel('Solar elevation angle [deg]')
ax.plot(solar_elevation_angle)
ax.scatter(max_idx, solar_elevation_angle[max_idx], marker='x', color='red')
plt.show()

我的问题是：有没有更好的方法来解决这个问题？我发现必须手动指定削波阈值以消除高斯滤波器中的噪声和 sigma 会大大削弱这种方法，使其无法应用于其他类似情况。

【问题讨论】：

【参考方案1】：

第一个改进是使用a Savitzky-Golay filter 以噪音较小的方式找到导数。例如，它可以将抛物线（在最小二乘的意义上）拟合到特定大小的每个数据切片，然后对该抛物线求二阶导数。结果比只用gradient 取二阶差要好得多。这是窗口大小 101：

savgol_filter(solar_elevation_angle, window_length=window, polyorder=2, deriv=2)

其次，与其用argrelmax 寻找最大值点，不如寻找二阶导数较大的地方；例如，至少是其最大尺寸的一半。这当然会返回许多索引，但是我们可以查看这些索引之间的差距，以确定每个峰值的开始和结束位置。然后很容易找到峰值的中点。

这是完整的代码。唯一的参数是窗口大小，设置为 101。该方法是稳健的；尺寸 21 或 201 给出基本相同的结果（它必须是奇数）。

from scipy.signal import savgol_filter
window = 101
der2 = savgol_filter(solar_elevation_angle, window_length=window, polyorder=2, deriv=2)
max_der2 = np.max(np.abs(der2))
large = np.where(np.abs(der2) > max_der2/2)[0]
gaps = np.diff(large) > window
begins = np.insert(large[1:][gaps], 0, large[0])
ends = np.append(large[:-1][gaps], large[-1])
changes = ((begins+ends)/2).astype(np.int)
plt.plot(solar_elevation_angle)
plt.plot(changes, solar_elevation_angle[changes], 'ro')
plt.show()

插入和追加的大惊小怪是因为第一个具有大导数的索引应该符合“峰值开始”的条件，最后一个这样的索引应该符合“峰值结束”的条件，即使它们旁边没有合适的间隙（差距是无限的）。

分段线性拟合

这是一种替代方法（不一定更好），它不使用导数：拟合 smoothing spline of degree 1（即分段线性曲线），并注意其节点在哪里。

首先，将数据（我称之为y而不是solar_elevation_angle）标准化为标准差为1。

y /= np.std(y)

第一步是构建一个分段线性曲线，该曲线最多偏离数据给定的阈值，任意设置为 0.1（此处没有单位，因为 y 已归一化）。这是通过重复调用UnivariateSpline 来完成的，从一个大的平滑参数开始并逐渐减小它直到曲线适合。（不幸的是，不能简单地传入所需的统一误差范围）。

from scipy.interpolate import UnivariateSpline
threshold = 0.1

m = y.size
x = np.arange(m)
s = m
max_error = 1
while max_error > threshold: 
  spl = UnivariateSpline(x, y, k=1, s=s)
  interp_y = spl(x)
  max_error = np.max(np.abs(interp_y - y))
  s /= 2
knots = spl.get_knots()
values = spl(knots)

到目前为止，我们找到了节点，并记录了这些节点处样条曲线的值。但并非所有这些结都非常重要。为了测试每个结的重要性，我将其移除并在没有它的情况下进行插值。如果新插值与旧插值有很大不同（误差加倍），则节点被认为是重要的，并被添加到发现的斜率变化列表中。

ts = knots.size
idx = np.arange(ts)
changes = []
for j in range(1, ts-1):
  spl = UnivariateSpline(knots[idx != j], values[idx != j], k=1, s=0)
  if np.max(np.abs(spl(x) - interp_y)) > 2*threshold:
    changes.append(knots[j])
plt.plot(y)
plt.plot(changes, y[np.array(changes, dtype=int)], 'ro')
plt.show()

理想情况下，将分段线性函数拟合到给定数据，增加结的数量，直到再增加一个不会带来“实质性”改进。以上是 SciPy 工具的粗略近似，但远非最佳。我不知道 Python 中有任何现成的分段线性模型选择工具。

【讨论】：

谢谢，这是一个很好的解决方案！我对 Savitzky-Golay 过滤器不是很熟悉。事实证明，它对于平滑周围的事物非常方便。我喜欢您还可以识别每个俯仰滚动的开始和停止，这将非常有用。我需要更多时间来完全掌握你的替代方法，看起来也很有趣。我没有尝试但至少应该提到的方法是find_peaks_cwt应用于二阶导数；与简单的 argrelmax 不同，它包含过滤。

以上是关于使用 numpy/scipy 识别数字信号的斜率变化？的主要内容，如果未能解决你的问题，请参考以下文章