内插数据以协调来自链路不同侧面的读数的算法?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了内插数据以协调来自链路不同侧面的读数的算法?相关的知识,希望对你有一定的参考价值。
我有一个从时间戳和值的形式报告的非正向网络链路(A侧和Z侧)两侧的传输速率读数,它们以1分钟的间隔聚合和拉取。在理想的情况下,如果我们忽略传输延迟,链路两侧的读数应该相同(A侧的输出速率== Z侧的速率),我想用它们来检测传输中是否有数据丢失。问题是 - 读数到达不同的时间点,因此从Z侧的读数滞后N秒,这使得数据几乎没用,因为即使没有链路丢失,我也会在不同的时间点从Z侧获得读数,当时A方已经改变了
是否有任何插值算法可以帮助及时协调这些信号?
我尝试创建两个数据帧的公共索引,并使用线性插值为每个帧添加数据点。它提供了更好的图形对齐图像,但在快速增长/减速期间,同一时间点的数据点之间的距离很大,例如:
以字典形式显示的图表的源数据:
df_a_side_out = {'output_bps': {Timestamp('2019-04-17 09:29:40-0700', tz='US/Pacific'): 35382522872.0, Timestamp('2019-04-17 09:30:41-0700', tz='US/Pacific'): 21079385419.6, Timestamp('2019-04-17 09:31:40-0700', tz='US/Pacific'): 31227610322.8, Timestamp('2019-04-17 09:32:40-0700', tz='US/Pacific'): 27822829221.333332, Timestamp('2019-04-17 09:33:40-0700', tz='US/Pacific'): 32904048834.8, Timestamp('2019-04-17 09:34:40-0700', tz='US/Pacific'): 25492801008.933334, Timestamp('2019-04-17 09:35:41-0700', tz='US/Pacific'): 35440406212.13333, Timestamp('2019-04-17 09:36:40-0700', tz='US/Pacific'): 25233478935.466667, Timestamp('2019-04-17 09:37:41-0700', tz='US/Pacific'): 40124788802.53333, Timestamp('2019-04-17 09:38:40-0700', tz='US/Pacific'): 22751043828.666668, Timestamp('2019-04-17 09:39:40-0700', tz='US/Pacific'): 34929660187.2, Timestamp('2019-04-17 09:40:41-0700', tz='US/Pacific'): 28188317863.733334, Timestamp('2019-04-17 09:41:41-0700', tz='US/Pacific'): 21337236735.866665, Timestamp('2019-04-17 09:42:40-0700', tz='US/Pacific'): 20949231319.333332, Timestamp('2019-04-17 09:43:41-0700', tz='US/Pacific'): 37289827508.933334, Timestamp('2019-04-17 09:44:40-0700', tz='US/Pacific'): 43531218338.53333, Timestamp('2019-04-17 09:45:41-0700', tz='US/Pacific'): 31844675965.333332, Timestamp('2019-04-17 09:46:40-0700', tz='US/Pacific'): 2393.3333333333335, Timestamp('2019-04-17 09:47:40-0700', tz='US/Pacific'): 6485669413.066667, Timestamp('2019-04-17 09:48:40-0700', tz='US/Pacific'): 27114641050.266666, Timestamp('2019-04-17 09:49:41-0700', tz='US/Pacific'): 30240896003.409836, Timestamp('2019-04-17 09:50:40-0700', tz='US/Pacific'): 47081233669.830505, Timestamp('2019-04-17 09:51:40-0700', tz='US/Pacific'): 45941505223.6, Timestamp('2019-04-17 09:52:40-0700', tz='US/Pacific'): 32794663316.133335, Timestamp('2019-04-17 09:53:41-0700', tz='US/Pacific'): 26202902204.666668, Timestamp('2019-04-17 09:54:40-0700', tz='US/Pacific'): 42744363073.46667, Timestamp('2019-04-17 09:55:40-0700', tz='US/Pacific'): 37591667043.6, Timestamp('2019-04-17 09:56:40-0700', tz='US/Pacific'): 11035404304.8, Timestamp('2019-04-17 09:57:40-0700', tz='US/Pacific'): 7707897097.466666, Timestamp('2019-04-17 09:58:40-0700', tz='US/Pacific'): 25327914733.066666, Timestamp('2019-04-17 09:59:40-0700', tz='US/Pacific'): 15763228742.8, Timestamp('2019-04-17 10:00:41-0700', tz='US/Pacific'): 30068024369.2, Timestamp('2019-04-17 10:01:40-0700', tz='US/Pacific'): 58940292672.26667, Timestamp('2019-04-17 10:02:41-0700', tz='US/Pacific'): 43484764068.26667, Timestamp('2019-04-17 10:03:41-0700', tz='US/Pacific'): 12948002074.266666, Timestamp('2019-04-17 10:04:41-0700', tz='US/Pacific'): 7776379160.655738, Timestamp('2019-04-17 10:05:40-0700', tz='US/Pacific'): 34174506576.81356, Timestamp('2019-04-17 10:06:40-0700', tz='US/Pacific'): 34642321006.933334, Timestamp('2019-04-17 10:07:40-0700', tz='US/Pacific'): 44025919118.13333, Timestamp('2019-04-17 10:08:41-0700', tz='US/Pacific'): 51441310396.8, Timestamp('2019-04-17 10:09:41-0700', tz='US/Pacific'): 49744733006.666664, Timestamp('2019-04-17 10:10:40-0700', tz='US/Pacific'): 39372041772.53333, Timestamp('2019-04-17 10:11:40-0700', tz='US/Pacific'): 37212362739.73333, Timestamp('2019-04-17 10:12:41-0700', tz='US/Pacific'): 29888187478.133335, Timestamp('2019-04-17 10:13:41-0700', tz='US/Pacific'): 23647225076.8, Timestamp('2019-04-17 10:14:41-0700', tz='US/Pacific'): 44232721589.333336, Timestamp('2019-04-17 10:15:40-0700', tz='US/Pacific'): 31619739302.8, Timestamp('2019-04-17 10:16:41-0700', tz='US/Pacific'): 34270903419.866665, Timestamp('2019-04-17 10:17:41-0700', tz='US/Pacific'): 37255143804.26667, Timestamp('2019-04-17 10:18:40-0700', tz='US/Pacific'): 29626685689.333332, Timestamp('2019-04-17 10:19:41-0700', tz='US/Pacific'): 37738576156.8, Timestamp('2019-04-17 10:20:41-0700', tz='US/Pacific'): 32520425703.733334, Timestamp('2019-04-17 10:21:40-0700', tz='US/Pacific'): 50682096771.066666, Timestamp('2019-04-17 10:22:40-0700', tz='US/Pacific'): 53442027636.0, Timestamp('2019-04-17 10:23:40-0700', tz='US/Pacific'): 48346635537.066666, Timestamp('2019-04-17 10:24:41-0700', tz='US/Pacific'): 28192208534.0, Timestamp('2019-04-17 10:25:41-0700', tz='US/Pacific'): 30508158848.533333, Timestamp('2019-04-17 10:26:40-0700', tz='US/Pacific'): 38669708961.73333, Timestamp('2019-04-17 10:27:41-0700', tz='US/Pacific'): 41905851091.333336, Timestamp('2019-04-17 10:28:40-0700', tz='US/Pacific'): 37885503188.4}}
df_z_side_in = {'input_bps': {Timestamp('2019-04-17 09:29:21-0700', tz='US/Pacific'): 32479665734.933334, Timestamp('2019-04-17 09:30:21-0700', tz='US/Pacific'): 28762393063.213116, Timestamp('2019-04-17 09:31:21-0700', tz='US/Pacific'): 24012409059.66102, Timestamp('2019-04-17 09:32:20-0700', tz='US/Pacific'): 30912397690.8, Timestamp('2019-04-17 09:33:21-0700', tz='US/Pacific'): 30150484213.508198, Timestamp('2019-04-17 09:34:21-0700', tz='US/Pacific'): 26572558234.666668, Timestamp('2019-04-17 09:35:20-0700', tz='US/Pacific'): 38830624164.47458, Timestamp('2019-04-17 09:36:20-0700', tz='US/Pacific'): 26512584207.866665, Timestamp('2019-04-17 09:37:20-0700', tz='US/Pacific'): 32343571104.133335, Timestamp('2019-04-17 09:38:21-0700', tz='US/Pacific'): 28372191073.704918, Timestamp('2019-04-17 09:39:20-0700', tz='US/Pacific'): 30009804008.677967, Timestamp('2019-04-17 09:40:20-0700', tz='US/Pacific'): 30764259885.2, Timestamp('2019-04-17 09:41:20-0700', tz='US/Pacific'): 27229582440.533333, Timestamp('2019-04-17 09:42:21-0700', tz='US/Pacific'): 12670550319.868853, Timestamp('2019-04-17 09:43:21-0700', tz='US/Pacific'): 38891533755.333336, Timestamp('2019-04-17 09:44:21-0700', tz='US/Pacific'): 46374133014.644066, Timestamp('2019-04-17 09:45:20-0700', tz='US/Pacific'): 40275148155.46667, Timestamp('2019-04-17 09:46:21-0700', tz='US/Pacific'): 2374.032786885246, Timestamp('2019-04-17 09:47:20-0700', tz='US/Pacific'): 3260927513.220339, Timestamp('2019-04-17 09:48:21-0700', tz='US/Pacific'): 19319788768.666668, Timestamp('2019-04-17 09:49:21-0700', tz='US/Pacific'): 29479921822.133335, Timestamp('2019-04-17 09:50:21-0700', tz='US/Pacific'): 42536464523.27869, Timestamp('2019-04-17 09:51:21-0700', tz='US/Pacific'): 48253007455.32204, Timestamp('2019-04-17 09:52:20-0700', tz='US/Pacific'): 28098055972.266666, Timestamp('2019-04-17 09:53:20-0700', tz='US/Pacific'): 34696013048.8, Timestamp('2019-04-17 09:54:21-0700', tz='US/Pacific'): 41089541187.540985, Timestamp('2019-04-17 09:55:20-0700', tz='US/Pacific'): 35818326833.355934, Timestamp('2019-04-17 09:56:21-0700', tz='US/Pacific'): 24461996828.0, Timestamp('2019-04-17 09:57:21-0700', tz='US/Pacific'): 2534090684.266667, Timestamp('2019-04-17 09:58:21-0700', tz='US/Pacific'): 22127687010.229507, Timestamp('2019-04-17 09:59:21-0700', tz='US/Pacific'): 23025967406.915253, Timestamp('2019-04-17 10:00:20-0700', tz='US/Pacific'): 10059074966.266666, Timestamp('2019-04-17 10:01:21-0700', tz='US/Pacific'): 67497142954.0, Timestamp('2019-04-17 10:02:21-0700', tz='US/Pacific'): 46389235268.0, Timestamp('2019-04-17 10:03:20-0700', tz='US/Pacific'): 21655645611.2, Timestamp('2019-04-17 10:04:21-0700', tz='US/Pacific'): 966253748.4, Timestamp('2019-04-17 10:05:20-0700', tz='US/Pacific'): 27733135839.866665, Timestamp('2019-04-17 10:06:21-0700', tz='US/Pacific'): 38420361510.55738, Timestamp('2019-04-17 10:07:20-0700', tz='US/Pacific'): 38791963200.27119, Timestamp('2019-04-17 10:08:21-0700', tz='US/Pacific'): 49337311755.333336, Timestamp('2019-04-17 10:09:21-0700', tz='US/Pacific'): 49036736751.2, Timestamp('2019-04-17 10:10:21-0700', tz='US/Pacific'): 40189220408.0, Timestamp('2019-04-17 10:11:20-0700', tz='US/Pacific'): 47269187739.333336, Timestamp('2019-04-17 10:12:21-0700', tz='US/Pacific'): 22747569814.666668, Timestamp('2019-04-17 10:13:20-0700', tz='US/Pacific'): 29592627519.066666, Timestamp('2019-04-17 10:14:21-0700', tz='US/Pacific'): 39522624640.78689, Timestamp('2019-04-17 10:15:20-0700', tz='US/Pacific'): 33426815865.627117, Timestamp('2019-04-17 10:16:20-0700', tz='US/Pacific'): 36818438483.86667, Timestamp('2019-04-17 10:17:21-0700', tz='US/Pacific'): 36014942532.327866, Timestamp('2019-04-17 10:18:21-0700', tz='US/Pacific'): 32190457857.333332, Timestamp('2019-04-17 10:19:20-0700', tz='US/Pacific'): 33696489212.067795, Timestamp('2019-04-17 10:20:20-0700', tz='US/Pacific'): 33386886955.333332, Timestamp('2019-04-17 10:21:20-0700', tz='US/Pacific'): 47954604950.13333, Timestamp('2019-04-17 10:22:21-0700', tz='US/Pacific'): 54281759713.57377, Timestamp('2019-04-17 10:23:20-0700', tz='US/Pacific'): 43724407654.37288, Timestamp('2019-04-17 10:24:20-0700', tz='US/Pacific'): 36995567964.666664, Timestamp('2019-04-17 10:25:21-0700', tz='US/Pacific'): 25491555548.590164, Timestamp('2019-04-17 10:26:21-0700', tz='US/Pacific'): 38326723270.26667, Timestamp('2019-04-17 10:27:20-0700', tz='US/Pacific'): 43034165564.61017, Timestamp('2019-04-17 10:28:20-0700', tz='US/Pacific'): 37405127893.6}}
答案
Method 1 to align plot
我们可以执行以下操作以使数据完全对齐,但我不确定这是什么类型的数据,以及它是否真的有意义以这种方式解决它。但也许这会有所帮助。
df = pd.concat([df_a, df_b], axis=1)
df['output_bps'].fillna(df_d['input_bps'], inplace=True)
df['input_bps'].fillna(df_d['output_bps'], inplace=True)
然后我们再次绘制,我们看到它完全对齐。正如我们在传说中所看到的,它实际上是两条线
fig = plt.figure(figsize=(16,10))
plt.plot(df['output_bps'], label='Side A out')
plt.plot(df['input_bps'], label='Side Z in')
plt.legend(loc='upper left')
plt.show()
Method 2 to find more accurate differences
好像我理解正确。由于传感器记录数据的时间戳不同,很难找到准确的差异(损失)。
我们可以处理数据以使其更准确。不只是内插。但是将我们的数据重新采样到1 second
索引,然后插值以获得更高的准确性。之后,我们在相同的时间戳上采用差异来找出差异。
这是我能得到的最接近的:
# reindex to make a new dataframe
df_z = pd.DataFrame(index=pd.date_range(start=df.index.min(),
end=df.index.max(),
freq='1S'), columns=df.columns)
# merge the values of original dataframe and remove columns we dont need
df_z = df_z.merge(df,
left_index=True,
right_index=True,
how='left',
suffixes=['_1', '']).filter(regex='(^[^0-9]+$)')
# fill NaN by linear interpolation
for col in df_z.columns:
df_z[col] = df_z[col].interpolate(method='linear', limit_direction='both', )
# Calculate the loss on each second
df_z['loss'] = df_z['output_bps'] - df_z['input_bps']
现在我们可以再次绘制我们的数据,包括损失
fig = plt.figure(figsize=(16,10))
plt.plot(df_z['output_bps'], label='Side A out')
plt.plot(df_z['input_bps'], label='Side Z in')
plt.plot(df_z['loss'], label='Loss')
plt.legend(loc='upper left')
plt.show()
以上是关于内插数据以协调来自链路不同侧面的读数的算法?的主要内容,如果未能解决你的问题,请参考以下文章