对相似的 URL 进行分组
Posted
技术标签:
【中文标题】对相似的 URL 进行分组【英文标题】:Group similar URLs 【发布时间】:2019-06-21 04:43:50 【问题描述】:我希望获取对 xmlrpc.php 和 wp-login.php 的所有请求,并在语句中使用通配符。
但这带来了一个问题,因为它不会仅在两行中输出 xmlrpc 和 wp-login 的数据,而且还包括附加查询的 URL。希望它包含请求的每个 URL,但将它们组合起来显示为 xmlrpc.php 或 wp-login.php
我是一个 mysql n00b 并且正在玩 substr replace 和 group_concat 但无法让它工作。
WITH
subq AS (
SELECT url, COUNT(url) AS count
FROM `flywheel-production.fastly_logs.ingress_logs`
WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
AND (url LIKE "/wp-login.php%" OR url LIKE "/xmlrpc.php%")
AND site_hash = "btmpuroizf"
GROUP BY url
)
SELECT
url,
count,
ROUND(count / (SELECT SUM(count) FROM subq) * 100, 2) AS percent
FROM subq
ORDER BY count DESC
任何帮助将不胜感激。谢谢!
【问题讨论】:
【参考方案1】:对于 BigQuery 标准 SQL
以下调整后的查询应该做“技巧”
#standardSQL
WITH subq AS (
SELECT REGEXP_EXTRACT(url, r'(.*?)(?:\?|$)') url, COUNT(url) AS COUNT
FROM `flywheel-production.fastly_logs.ingress_logs`
WHERE timestamp > TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL -1 DAY)
AND (url LIKE "/wp-login.php%" OR url LIKE "/xmlrpc.php%")
AND site_hash = "btmpuroizf"
GROUP BY url
)
SELECT
url,
COUNT,
ROUND(COUNT / (SELECT SUM(COUNT) FROM subq) * 100, 2) AS percent
FROM subq
ORDER BY COUNT DESC
【讨论】:
你摇滚!工作完美。谢谢!以上是关于对相似的 URL 进行分组的主要内容,如果未能解决你的问题,请参考以下文章