CollectTop 返回的行数超出了我在 Azure 流分析中的预期
Posted
技术标签:
【中文标题】CollectTop 返回的行数超出了我在 Azure 流分析中的预期【英文标题】:CollectTop is returning more rows than I would expect in Azure Stream Analytics 【发布时间】:2020-08-05 14:12:44 【问题描述】:我上传了以下输入(在 Azure 门户中进行测试):
[
"engineid":"engine001","eventtime":1,"tmp":19.3,"hum":0.22,
"engineid":"engine001","eventtime":2,"tmp":19.7,"hum":0.21,
"engineid":"engine002","eventtime":3,"tmp":20.4,"hum":0.25,
"engineid":"engine001","eventtime":4,"tmp":19.6,"hum":0.24
]
然后我尝试获取记录组,以便每个引擎都有最后 2 行。正如您在示例中看到的那样,我只有 2 个不同的引擎,所以我希望输出有两条记录,每条记录都包含排名记录,但我得到了 4 条输出记录。
这是我的查询:
-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
SELECT engineid, tmp, hum, eventtime
FROM [engine-telemetry]
WHERE engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
SELECT engineid,
CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
FROM
[RelevantTelemetry]
WHERE engineid IS NOT NULL
GROUP BY SlidingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT TimeWindow
INTO debug
FROM TimeWindows
我使用了 TIMESTAMP BY 属性,更改了 GROUP BY 的顺序等,但我仍然有以下 4 条记录,而不是我期望的 2 条:
有什么想法吗?
[
"TimeWindow":
[
"rank":1,"value": "engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1
],
"TimeWindow":
[
"rank":1,"value":"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4,
"rank":2,"value":"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2,
"rank":3,"value":"engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1
],
"TimeWindow":
[
"rank":1,"value":"engineid":"engine002","tmp":0.0017,"hum":0.0003,"eventtime":3
],
"TimeWindow":
[
"rank":1,"value":"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4,
"rank":2,"value":"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2
]
]
【问题讨论】:
你试过用GROUP BY TumblingWindow(hour, 24), engineid
吗?
谢谢,史蒂夫 - 这似乎确实有效。
【参考方案1】:
根据@SteveZhao 的建议,您需要使用GROUP BY TumblingWindow(hour, 24), engineid
而不是GROUP BY SlidingWindow(hour, 24), engineid
滑动窗口可以根据时间间隔重叠条目
欲了解更多信息,请参阅: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions
【讨论】:
谢谢,普拉纳夫和史蒂夫。这似乎会带来更好的结果。我认为滑动窗口将始终保持最新的 N 条记录可用,并且担心 TumblingWindow 可能没有 N 条记录,如果它们越过边界并且新窗口已经开始......以上是关于CollectTop 返回的行数超出了我在 Azure 流分析中的预期的主要内容,如果未能解决你的问题,请参考以下文章
BatchingBatcher 上的奇怪 NPE“JDBC 驱动程序未返回预期的行数”
StaleStateException:批量更新从更新 [0] 返回了意外的行数;实际行数:0;预期:1
org.hibernate.StaleStateException:批量更新从更新 [0] 返回了意外的行数;实际行数:0;预期:1