CollectTop 返回的行数超出了我在 Azure 流分析中的预期

Posted

技术标签:

【中文标题】CollectTop 返回的行数超出了我在 Azure 流分析中的预期【英文标题】:CollectTop is returning more rows than I would expect in Azure Stream Analytics 【发布时间】:2020-08-05 14:12:44 【问题描述】:

我上传了以下输入(在 Azure 门户中进行测试):

[
  "engineid":"engine001","eventtime":1,"tmp":19.3,"hum":0.22,
  "engineid":"engine001","eventtime":2,"tmp":19.7,"hum":0.21,
  "engineid":"engine002","eventtime":3,"tmp":20.4,"hum":0.25,
  "engineid":"engine001","eventtime":4,"tmp":19.6,"hum":0.24
]

然后我尝试获取记录组,以便每个引擎都有最后 2 行。正如您在示例中看到的那样,我只有 2 个不同的引擎,所以我希望输出有两条记录,每条记录都包含排名记录,但我得到了 4 条输出记录。

这是我的查询:

-- Taking relevant fields from the input stream
WITH RelevantTelemetry AS
(
    SELECT  engineid, tmp, hum, eventtime
    FROM    [engine-telemetry] 
    WHERE   engineid IS NOT NULL
),
-- Grouping by engineid in TimeWindows
TimeWindows AS
(
    SELECT engineid, 
        CollectTop(2) OVER (ORDER BY eventtime DESC) as TimeWindow
    FROM
        [RelevantTelemetry]
    WHERE engineid IS NOT NULL
    GROUP BY SlidingWindow(hour, 24), engineid
)
--Output timewindows for verification purposes
SELECT TimeWindow
INTO debug
FROM TimeWindows

我使用了 TIMESTAMP BY 属性,更改了 GROUP BY 的顺序等,但我仍然有以下 4 条记录,而不是我期望的 2 条:

有什么想法吗?

[
"TimeWindow":
  [
    "rank":1,"value": "engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1
  ],
"TimeWindow":
  [
    "rank":1,"value":"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4,
    "rank":2,"value":"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2,
    "rank":3,"value":"engineid":"engine001","tmp":0.0003,"hum":-0.0002,"eventtime":1
  ],
"TimeWindow":
  [
    "rank":1,"value":"engineid":"engine002","tmp":0.0017,"hum":0.0003,"eventtime":3
  ],
"TimeWindow":
  [
    "rank":1,"value":"engineid":"engine001","tmp":-0.0019,"hum":-0.0002,"eventtime":4,
    "rank":2,"value":"engineid":"engine001","tmp":-0.0026,"hum":-0.0002,"eventtime":2
  ]
]

【问题讨论】:

你试过用GROUP BY TumblingWindow(hour, 24), engineid吗? 谢谢,史蒂夫 - 这似乎确实有效。 【参考方案1】:

根据@SteveZhao 的建议,您需要使用GROUP BY TumblingWindow(hour, 24), engineid 而不是GROUP BY SlidingWindow(hour, 24), engineid

滑动窗口可以根据时间间隔重叠条目

欲了解更多信息,请参阅: https://docs.microsoft.com/en-us/azure/stream-analytics/stream-analytics-window-functions

【讨论】:

谢谢,普拉纳夫和史蒂夫。这似乎会带来更好的结果。我认为滑动窗口将始终保持最新的 N 条记录可用,并且担心 TumblingWindow 可能没有 N 条记录,如果它们越过边界并且新窗口已经开始......

以上是关于CollectTop 返回的行数超出了我在 Azure 流分析中的预期的主要内容,如果未能解决你的问题,请参考以下文章

Xcode:如何快速判断哪些文件的行数超出了最大列限制?

Pivot Function添加的行数超出预期

BatchingBatcher 上的奇怪 NPE“JDBC 驱动程序未返回预期的行数”

StaleStateException:批量更新从更新 [0] 返回了意外的行数;实际行数:0;预期:1

PHP MySQL 相同的查询返回不同的行数

org.hibernate.StaleStateException:批量更新从更新 [0] 返回了意外的行数;实际行数:0;预期:1