如何改进热图的大型数据集的 KQL 查询

Posted 2023-02-19

技术标签:

【中文标题】如何改进热图的大型数据集的 KQL 查询【英文标题】：How can I improve KQL query for large dataset for heatmap 【发布时间】：2021-09-14 22:03:37 【问题描述】：

我在下面有一个 KQL 查询，它将提供一个非常好的热图，以按国家/地区绘制 Azure WAF 的最高访问权限。

这里的挑战是这个查询不能超过 24 小时，因为我拥有的记录数量太大了。我该如何改进它以显示每周和每月的统计数据？

// source: https://datahub.io/core/geoip2-ipv4
set notruncation;
let CountryDB=externaldata(Network:string, geoname_id:string, continent_code:string, continent_name:string, country_iso_code:string, country_name:string)
[@"https://datahub.io/core/geoip2-ipv4/r/geoip2-ipv4.csv"]
| extend Dummy=1;
let AppGWAccess = AzureDiagnostics
| where ResourceType == "APPLICATIONGATEWAYS"
| where Category == "ApplicationGatewayAccessLog"
| where userAgent_s !in ("bot")
| project TimeGenerated, clientIP_s;
AppGWAccess
| extend Dummy=1
| summarize count() by Hour=bin(TimeGenerated,6h), clientIP_s,Dummy
| partition by Hour(
                  lookup (CountryDB|extend Dummy=1) on Dummy
                | where ipv4_is_match(clientIP_s, Network)
                )
| summarize sum(count_) by country_name

【问题讨论】：

【参考方案1】：

您正在做的是对所有数据创建每小时聚合。相反，您应该创建一个 Materialized View，它将在后台为您进行聚合。

引用文档：

物化视图公开了对源表的聚合查询。物化视图总是返回聚合查询的最新结果（总是新鲜的）。查询物化视图比直接在源表上运行聚合（每次查询都执行）性能更高。

【讨论】：

以上是关于如何改进热图的大型数据集的 KQL 查询的主要内容，如果未能解决你的问题，请参考以下文章