在流 Azure 分析中将对象解析为输出中的字符串
Posted
技术标签:
【中文标题】在流 Azure 分析中将对象解析为输出中的字符串【英文标题】:Parse an object as string in output in Stream Azure Analytics 【发布时间】:2016-02-26 14:45:36 【问题描述】:这个问题是关于流分析的。我想将 blob 导出到 SQL 中。我知道这个过程,我的问题是我必须使用的查询。
"performanceCounter":["available_bytes":"value":994164736.0,"categoryName":"Memory","instanceName":""],"internal":"data":"id":"459bf840-d259-11e5-a640-1df0b6342362","documentVersion":"1.61","context":"device":"type":"PC","network":"Ethernet","screenResolution":,"locale":"en-US","id":"RD0003FF73B748","roleName":"Sdm.MyGovId.Static.Web","roleInstance":"Sdm.MyGovId.Static.Web_IN_1","oemName":"Microsoft Corporation","deviceName":"Virtual Machine","deviceModel":"Virtual Machine","application":"version":"R2.0_20160205.5","location":"continent":"North America","country":"United States","clientip":"104.41.209.0","province":"Washington","city":"Redmond","data":"isSynthetic":false,"samplingRate":100.0,"eventTime":"2016-02-13T13:53:44.2667669Z","user":"isAuthenticated":false,"anonAcquisitionDate":"0001-01-01T00:00:00Z","authAcquisitionDate":"0001-01-01T00:00:00Z","accountAcquisitionDate":"0001-01-01T00:00:00Z","operation":,"cloud":,"serverDevice":,"custom":"dimensions":[],"metrics":[],"session":
"performanceCounter":["percentage_processor_total":"value":0.0123466420918703,"categoryName":"Processor","instanceName":"_Total"],"internal":"data":"id":"459bf841-d259-11e5-a640-1df0b6342362","documentVersion":"1.61","context":"device":"type":"PC","network":"Ethernet","screenResolution":,"locale":"en-US","id":"RD0003FF73B748","roleName":"Sdm.MyGovId.Static.Web","roleInstance":"Sdm.MyGovId.Static.Web_IN_1","oemName":"Microsoft Corporation","deviceName":"Virtual Machine","deviceModel":"Virtual Machine","application":"version":"R2.0_20160205.5","location":"continent":"North America","country":"United States","clientip":"104.41.209.0","province":"Washington","city":"Redmond","data":"isSynthetic":false,"samplingRate":100.0,"eventTime":"2016-02-13T13:53:44.2668221Z","user":"isAuthenticated":false,"anonAcquisitionDate":"0001-01-01T00:00:00Z","authAcquisitionDate":"0001-01-01T00:00:00Z","accountAcquisitionDate":"0001-01-01T00:00:00Z","operation":,"cloud":,"serverDevice":,"custom":"dimensions":[],"metrics":[],"session":
"performanceCounter":["percentage_processor_time":"value":0.0,"categoryName":"Process","instanceName":"w3wp"],"internal":"data":"id":"459bf842-d259-11e5-a640-1df0b6342362","documentVersion":"1.61","context":"device":"type":"PC","network":"Ethernet","screenResolution":,"locale":"en-US","id":"RD0003FF73B748","roleName":"Sdm.MyGovId.Static.Web","roleInstance":"Sdm.MyGovId.Static.Web_IN_1","oemName":"Microsoft Corporation","deviceName":"Virtual Machine","deviceModel":"Virtual Machine","application":"version":"R2.0_20160205.5","location":"continent":"North America","country":"United States","clientip":"104.41.209.0","province":"Washington","city":"Redmond","data":"isSynthetic":false,"samplingRate":100.0,"eventTime":"2016-02-13T13:53:44.2668342Z","user":"isAuthenticated":false,"anonAcquisitionDate":"0001-01-01T00:00:00Z","authAcquisitionDate":"0001-01-01T00:00:00Z","accountAcquisitionDate":"0001-01-01T00:00:00Z","operation":,"cloud":,"serverDevice":,"custom":"dimensions":[],"metrics":[],"session":
你可以看到 3 个 json 对象,它们中的数组 performanceCounter 中的对象具有不同的字段。基本上每个对象的第一个对象。第一个是available_bytes,第二个是percentage_processor_total,第三个是percentage_processor_time。
因为我要将它导出到一个名为 performaceCounter 的 sql 表中,所以我应该为每个不同的对象设置一个不同的列,所以我想将它保存到一个字符串中,然后在我的应用程序中解析它。
作为起点,我有这个查询,它读取输入(blob)并写入输出(SQL)
Select GetArrayElement(A.performanceCounter,0) as a
INTO
PerformanceCounterOutput
FROM PerformanceCounterInput A
此 GetArrayElement 在 performanceCounter 中获取数组的索引 0,然后为在每个对象中找到的每个不同字段写入不同的列。所以我应该有所有不同的计数器并为每个计数器创建一个列,但我的想法更像是一个列调用 performanceCounterData 并保存字符串,如
'"available_bytes":"value":994164736.0,"categoryName":"Memory","instanceName":""'
或者这个
""percentage_processor_total":"value":0.0123466420918703,"categoryName":"Processor","instanceName":"_Total""
或
""percentage_processor_time":"value":0.0,"categoryName":"Process","instanceName":"w3wp""
如何转换像字符串这样的数组? 我试过 CAST(GetArrayElement(A.performanceCounter,0) as nvarchar(max)) 但我不能。
请一些好的帮助会得到奖励
【问题讨论】:
【参考方案1】:通过以下解决方案,我得到 2 列带有属性的名称和另一列带有属性的值,这是我最初的目的
With pc as
(
Select
GetArrayElement(A.[performanceCounter],0) as counter
,A.context.data.eventTime as eventTime
,A.context.location.clientip as clientIp
,A.context.location.continent as continent
,A.context.location.country as country
,A.context.location.province as province
,A.context.location.city as city
FROM PerformanceCounterInput A
)
select
props.propertyName,
props.propertyValue,
pc.counter.categoryName,
pc.counter.instanceName,
pc.eventTime,
pc.clientIp,
pc.continent,
pc.country,
pc.province,
pc.city
from pc
cross apply GetRecordProperties(pc.counter) as props
where props.propertyname<>'categoryname' and props.propertyname<>'instancename'
无论如何,如果有人发现如何在分析中用纯文本编写对象,仍然会得到奖励和赞赏
【讨论】:
【参考方案2】:您可以执行以下操作,这会将计数器作为 (propertyName, propertyValue) 对。
with T1 as
(
select
GetArrayElement(iotInput.performanceCounter, 0) Counter,
System.Timestamp [EventTime]
from
iotInput timestamp by context.data.eventTime
)
select
[EventTime],
Counter.categoryName,
Counter.available_bytes [Value]
from
T1
where
Counter.categoryName = 'Memory'
union all
select
[EventTime],
Counter.categoryName,
Counter.percentage_processor_time [Value]
from
T1
where
Counter.categoryName = 'Process'
也可以执行为每种计数器类型提供一列的查询,您必须为每个计数器使用“case”语句进行连接或分组。
【讨论】:
在这种情况下,我需要事先知道将导出到 blob 的所有不同计数器(available_bytes,percentage_processor_time......)因为它是 JSon,我们没有确定的结构所以任何时候我都可以收到不同的柜台。看看我在下面找到的解决方案,看看我的真正意思。无论如何,非常感谢您的时间和不同的方法 GetRecordProperties() 函数非常适合这里。抱歉,没有注意到您已经在使用它。您编写的查询是目前执行此操作的最佳方式。以上是关于在流 Azure 分析中将对象解析为输出中的字符串的主要内容,如果未能解决你的问题,请参考以下文章
将流分析作业中的输出数据流式传输到 Azure Synapse Analytics sql 池表?
在流分析中将时间戳拆分为单独的列,以便在 Power BI 中进行进一步筛选