天蓝色流分析是第一个和最后一个查询
Posted
技术标签:
【中文标题】天蓝色流分析是第一个和最后一个查询【英文标题】:azure Stream analytics isfirst and last query 【发布时间】:2020-02-13 06:43:21 【问题描述】:我有类似下面的有效载荷。我需要每 1 分钟获取第一个不同的批次值。请告诉我如何使用 isfirst 和 lag 或 last 在流分析中实现这一目标
输出如下:
BATCH=01,"2015-01-01T00:00:01.0000000Z" BATCH=02,"2015-01-01T00:00:03.0000000Z" BATCH=03,"2015-01-01T00:00:06.0000000Z" BATCH=01,"2015-01-01T00:00:14.0000000Z" BATCH=02,"2015-01-01T00:00:18.0000000Z" BATCH=03,"2015-01-01T00:00:22.0000000Z" BATCH=01,"2015-01-01T00:00:27.0000000Z" BATCH=01,"2015-01-01T00:00:31.0000000Z"
Pay Load:
[
"Payload":
"Make": "BATCH1",
"VAL": "01",
"TS": "2015-01-01T00:00:01.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "01",
"TS": "2015-01-01T00:00:02.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "02",
"TS": "2015-01-01T00:00:03.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "02",
"TS": "2015-01-01T00:00:04.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "02",
"TS": "2015-01-01T00:00:05.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "03",
"TS": "2015-01-01T00:00:06.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "03",
"TS": "2015-01-01T00:00:07.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "03",
"TS": "2015-01-01T00:00:10.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "03",
"TS": "2015-01-01T00:00:11.0000000Z"
,
"Payload":
"Make": "BATCH1",
"VAL": "03",
"TS": "2015-01-01T00:00:12.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "01",
"TS": "2015-01-01T00:00:13.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "01",
"TS": "2015-01-01T00:00:14.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "01",
"TS": "2015-01-01T00:00:15.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "01",
"TS": "2015-01-01T00:00:16.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "01",
"TS": "2015-01-01T00:00:17.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "02",
"TS": "2015-01-01T00:00:18.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "02",
"TS": "2015-01-01T00:00:20.0000000Z"
,
"Payload":
"Make": "BATCH2",
"VAL": "02",
"TS": "2015-01-01T00:00:21.0000000Z"
,
"Payload":
"Make": "BATCH3",
"VAL": "02",
"TS": "2015-01-01T00:00:22.0000000Z"
,
"Payload":
"Make": "BATCH3",
"VAL": "02",
"TS": "2015-01-01T00:00:23.0000000Z"
,
"Payload":
"Make": "BATCH3",
"VAL": "02",
"TS": "2015-01-01T00:00:24.0000000Z"
,
"Payload":
"Make": "BATCH3",
"VAL": "02",
"TS": "2015-01-01T00:00:25.0000000Z"
,
"Payload":
"Make": "BATCH3",
"VAL": "02",
"TS": "2015-01-01T00:00:26.0000000Z"
,
"Payload":
"Make": "BATCH4",
"VAL": "01",
"TS": "2015-01-01T00:00:27.0000000Z"
,
"Payload":
"Make": "BATCH4",
"VAL": "01",
"TS": "2015-01-01T00:00:28.0000000Z"
,
"Payload":
"Make": "BATCH4",
"VAL": "01",
"TS": "2015-01-01T00:00:29.0000000Z"
,
"Payload":
"Make": "BATCH4",
"VAL": "01",
"TS": "2015-01-01T00:00:30.0000000Z"
,
"Payload":
"Make": "BATCH5",
"VAL": "01",
"TS": "2015-01-01T00:00:31.0000000Z"
]
【问题讨论】:
嗨,先生。实际上,我无法得到您的关注。您想在 1 分钟窗口内实现类似TOP
的东西吗?
嗨 Jay,感谢您的回复,很抱歉我不清楚。在一分钟内,每个批次 ID 可以有多个 VAL 更改。例如:在几分钟内我可以得到 Make:batch1,Val :01, Make:batch1,val:01, Make:batch1,val:02, Make:batch1,val:02 ×××××××××××× Make:batch2,val:01, Make:batch2, val:01, Xxxxxxxxxx 从这里我只需要过滤每个批次更改的 val 并且没有重复项。我需要输出为 Make:batch1,val:01 Make:batch1,val:02 Make:batch2,val:01 Ofcourse with the original timestamps正如我提到的我的第一篇文章
【参考方案1】:
我试图将您的要求总结如下:
示例输入,在一分钟的窗口中,每个批次 ID 可以有多个 VAL 更改:
Make:batch1,Val:01, Make:batch1,val:01, Make:batch1,val:02, Make:batch1,val:02 ××××××××××××× Make:batch2,val:01, Make:batch2,val:01, Xxxxxxxxxx
想要的输出,每批只有val变化,没有重复:
Make:batch1,val:01 Make:batch1,val:02 Make:batch2,val:01
答案分为两部分:
1.采集静态周期数据,可以使用内置Tumbling Window function,如下:
2.没有像 distinct 那样的内置 ASA 功能来过滤重复项。我建议您使用 GROUP BY
、MAX
、ASA UDF
(link) 来接近您的结果。
SQL:
SELECT g.Payload.Make,g.Payload.VAL,max(udf.convertdate(g.Payload.TS)) as TS
FROM geoinput g TIMESTAMP BY g.Payload.TS
GROUP BY g.Payload.Make,g.Payload.VAL, TumblingWindow(Duration(minute, 1))
测试输出:
顺便说一句,我只是在 UDF 中使用下面的代码
var date = new Date(datetime);
return date.getTime();
另一种解决方法,您可以在 1 分钟内收集所有数据,然后使用 Azure Function as Output. 在 Azure 函数中,您可以根据需要处理数据。比如使用JSON对象来存储数据。Key-Value结构可以过滤重复行。
【讨论】:
感谢杰伊的回复。您的逻辑有效,但在第一种情况下,我需要 TS 作为日期时间格式。是的,我们也可以考虑第二种选择,但我必须单独在流分析中执行此操作。第一种方法的任何输入带有适当的时间戳?以上是关于天蓝色流分析是第一个和最后一个查询的主要内容,如果未能解决你的问题,请参考以下文章