如何提取图中所有包含曲线的峰的斜率？

Posted 2023-03-12

技术标签:

【中文标题】如何提取图中所有包含曲线的峰的斜率？【英文标题】：How to extract slope of all the peak containing curves in a graph? 【发布时间】：2021-01-24 07:36:23 【问题描述】：

我有一个数据集，我从中生成了图表。我可以使用scipy 从这些图中提取高于阈值的峰值。我正在尝试创建一个包含峰值特征的数据框，例如峰值、峰值宽度、峰值高度、包含峰值的曲线的斜率、包含峰值的曲线中的点数等。我正在努力寻找一种方法提取曲线中包含峰值的斜率和点数。

c_dict["L-04"][3][0] data 出现在粘贴箱链接中。

这是我尝试提取一些峰值特征的代码。

def extract_peak_features(c_dict,households):
    peak_list=[]
    width_list=[]
    half_width_list=[]
    smoke_list=[]
    house_list=[]
    for key,value in c_dict.items():
        if not key.startswith("L-01") and not key.startswith("H"):
            for k,v in value.items():
                if k==3:
                    if len(v) > 0:
                        if key in households:
                            smoking = 1
                        else:
                            smoking = 0
                        peaks, _ = find_peaks(v[0],prominence=50)
                        half_widths = peak_widths(v[0], peaks, rel_height=0.5)[0]
                        widths = peak_widths(v[0], peaks, rel_height=1)[0]
                        if len(peaks) > 0:
                            peak_list.extend(np.array(v[0])[peaks])
                            width_list.extend(widths)
                            half_width_list.extend(half_widths)
                            smoke_list.extend([smoking] * len(peaks))
                            house_list.extend([key] * len(peaks))
                        print(key,len(peaks),len(widths),len(half_widths))

    data = "ID":house_list,"peaks":peak_list,"width":width_list,"half_width":half_width_list,"smoke":smoke_list
    df_peak_stats = pd.DataFrame(data=data)
    return df_peak_stats
df_peak_stats = extract_peak_features(c_dict,households)

使用scipy 和matplotlib 绘制c_dict["L-04"][3][0] 数据的代码。

peaks, _ = find_peaks(c_dict["L-04"][3][0],prominence=50)
results_half = peak_widths(c_dict["L-04"][3][0], peaks, rel_height=0.5)
results_half[0]  # widths
results_full = peak_widths(c_dict["L-04"][3][0], peaks, rel_height=1)
plt.plot(c_dict["L-04"][3][0])
plt.plot(peaks, np.array(c_dict["L-04"][3][0])[peaks], "x")
#plt.hlines(*results_half[1:], color="C2")
plt.hlines(*results_full[1:], color="C3")
plt.show()

总之，我想知道如何提取上面4条曲线中包含峰值的斜率和点数。

【问题讨论】：

斜率不是连续函数吗？（由离散函数近似，但通常不是固定值）。在最高峰时，该值将为零。对于上面的例子，我假设图表将被分成包含峰值的 4 个子部分。对于这些曲线中的每一条，曲线将近似为线性模型。那么我假设可以找出这 4 个部分的斜率吗？对于包含峰值的四个小曲线中的每一个，斜率都是一个固定值。 【参考方案1】：

因为您的数据中的峰值是局部的，所以我为四个峰值中的每一个创建了 4 个子图。

from scipy.signal import find_peaks,peak_widths

test = np.array(test)
test_inds = np.arange(len(test))
peaks, _ = find_peaks(test,prominence=50)
prominences, left_bases, right_bases = peak_prominences(test,peaks)

offset = np.ones_like(prominences)
# Calculate widths at x[peaks] - offset * rel_height
widths, h_eval, left_ips, right_ips = peak_widths(
    test, peaks, 
    rel_height=1,
    prominence_data=(offset, left_bases, right_bases)
)

其中test 是您帖子中的数组。上面的代码基本上是在数组中定位峰，为了找到你想要的两个关联点：

上升曲线开始的峰值左侧的点峰值右侧的点及其值接近左侧的点

基于this post，可以使用kneed。

fig,ax = plt.subplots(nrows=2,ncols=2,figsize=(18,10))

for ind,item in enumerate(zip(left_ips,right_ips)):
    
    left_ip,right_ip = item
    row_idx,col_idx = ind // 2,ind % 2
    
    # This is where the peak locates 
    pc = np.array([int(left_ip)+1,test[int(left_ip)+1]])

    # find the point where the curve starts to increase
    # based on what your data look like, such a critical point can be found within the range 
    # test_inds[int(pc[0])-200: int(pc[0])], note that test_inds is an array of the inds of the points in your data
    kn_l = KneeLocator(test_inds[int(pc[0])-200:int(pc[0])],test[int(pc[0])-200:int(pc[0])],curve='convex',direction='increasing')
    kn_l = kn_l.knee
    pl = np.array([kn_l,test[kn_l]])
    # find the point to the right of the peak, the point is almost on the same level as the point on the left 
    # in this example, the threshold is set to 1
    mask_zero = np.abs(test - pl[1]*np.ones(len(test))) < 1
    mask_greater = test_inds > pc[0]
    pr_idx = np.argmax(np.logical_and(mask_zero,mask_greater))
    pr = np.array([pr_idx,test[pr_idx]])
    
    ax[row_idx][col_idx].set_xlim(int(pl[0])-20,int(pr[0])+20)
    ax[row_idx][col_idx].scatter(int(pl[0]),test[int(pl[0])],s=100,color='aquamarine',zorder=500)
    ax[row_idx][col_idx].scatter(int(pr[0]),test[int(pr[0])],s=100,color='aquamarine',zorder=500)
    
    get_angle = lambda v1, v2:\
        np.rad2deg(np.arccos(np.clip(np.dot(v1, v2) / np.linalg.norm(v1) / np.linalg.norm(v2),-1,1)))
    angle_l = get_angle(pr-pl,pc-pl)
    angle_r = get_angle(pl-pr,pc-pr)
    
    ax[row_idx][col_idx].annotate('%.2f deg' % angle_l,xy=pl+np.array([5,20]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    ax[row_idx][col_idx].annotate('%.2f deg' % angle_r,xy=pr+np.array([-1,20]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    ax[row_idx][col_idx].plot([pl[0],pc[0]],[pl[1],pc[1]],'-',lw=2,color='navy')
    ax[row_idx][col_idx].plot([pc[0],pr[0]],[pc[1],pr[1]],'-',lw=2,color='navy')
    
    ax[row_idx][col_idx].hlines(pl[1],pl[0],pc[0],linestyle='--',lw=.8,color='k')
    ax[row_idx][col_idx].hlines(pr[1],pc[0],pr[0],linestyle='--',lw=.8,color='k')
    ax[row_idx][col_idx].vlines(pc[0],pl[1],pc[1],linestyle='--',lw=.8,color='k')
    ax[row_idx][col_idx].vlines(pc[0],pr[1],pc[1],linestyle='--',lw=.8,color='k')
    
    rto_1 = (pc[1]-pl[1])/(pc[0]-pl[0])
    rto_2 = (pc[1]-pr[1])/(pc[0]-pr[0])
    ax[row_idx][col_idx].annotate('ratio1=%.3f' % rto_1,xy=pr+np.array([15,100]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    
    ax[row_idx][col_idx].annotate('ratio2=%.3f' % rto_2,xy=pr+np.array([15,60]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    
    pl_idx,pc_idx,pr_idx = pl[0].astype(np.int),pc[0].astype(np.int),pr[0].astype(np.int)
    ax[row_idx][col_idx].plot(range(int(pl[0])-20,pl_idx+1),test[int(pl[0])-20:pl_idx+1],'ko-',lw=1,markersize=1.5)
    ax[row_idx][col_idx].plot(range(pl_idx,pr_idx+1),test[pl_idx:pr_idx+1],'ro-',lw=1,zorder=200,markersize=1.5)
    ax[row_idx][col_idx].plot(range(pr_idx,int(pr[0])+20),test[pr_idx:int(pr[0])+20],'ko-',lw=1,markersize=1.5)
    ax[row_idx][col_idx].scatter(peaks[ind],test[peaks[ind]],marker='x',s=30,c='red',zorder=100)

【讨论】：

我假设 diff 包含图表每个点的曲线斜率。我想将包含曲线的峰解释为近似三角形并获得该三角形的斜率？ @Vivz 我已经更新了我的答案，因为所有包含曲线的峰值都涉及 3 个点，所以我没有使用线性回归来确定三角形的边缘。非常有趣的更新，但理想情况下，我希望这与基线有关，但与峰值点的 left_index 和 right_index 无关。我还想获得峰值与左基本索引值之间的差异与其对应的 x 轴值之间的差异的比率。 @meTchaikovsky @Vivz 谢谢。我已经更新了我的答案，也许这就是你要找的:) 并非如此。我想知道我的数据集，因为找到有噪声的数据集的斜率并非易事。我也不明白如何确定峰左侧和右侧基线上的最近点。

以上是关于如何提取图中所有包含曲线的峰的斜率？的主要内容，如果未能解决你的问题，请参考以下文章

曲线斜率怎么求

MATLAB：提取两个figure图中的数据

如何在python中求曲线的斜率？

曲线分类-特征提取

怎样求曲线上某一点的斜率