SQL 来确定不同的连续访问天数?
Posted
技术标签:
【中文标题】SQL 来确定不同的连续访问天数?【英文标题】:SQL to determine distinct periods of sequential days of access? 【发布时间】:2009-07-24 10:27:56 【问题描述】:Jeff 最近询问了this question 并得到了一些很好的答案。
Jeff 的问题围绕着查找已连续 (n) 天登录系统的用户。使用数据库表结构如下:
Id UserId CreationDate ------ ------ ------------ 750997 12 2009-07-07 18:42:20.723 750998 15 2009-07-07 18:42:20.927 751000 19 2009-07-07 18:42:22.283请先阅读the original question,然后再...
我对确定用户有多少个不同 (n) 天周期的问题很感兴趣。
是否可以设计一个快速的 SQL 查询来返回用户列表以及他们拥有的不同 (n) 天数?
编辑:根据下面的评论如果有人连续 2 天,然后是间隔,然后是连续 4 天,然后是间隔,然后是连续 8 天。这将是 3 个“不同的 4 天期间”。 8 天的周期应算作两个连续的 4 天周期。
【问题讨论】:
您能否详细说明您对“不同 (n) 天周期”的定义?如果某人有连续 2 天,然后是间隔,然后是连续 4 天,然后是间隔,然后是连续 8 天,那是 2 个“不同的 4 天周期”还是 3 个? (将 8 天的周期算作两个连续的 4 天周期?) 我将编辑问题,但使用您的示例将是 3 个不同的 4 天时间段。 【参考方案1】:我的答案似乎没有出现...
我会再试一次...
Rob Farley 对原始问题的回答有一个方便的好处,即包括连续天数。
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
使用整数除法,只需将连续天数相除即可得出整个连续周期所涵盖的“不同 (n) 天周期”的数量... - 2 / 4 = 0 - 4 / 4 = 1 - 8 / 4 = 2 - 9 / 4 = 2 - 等等等等
所以这是我对 Rob 对您需求的回答的看法... (我真的很喜欢Rob's answer,去看看解释吧,这很有启发性!)
with
numberedrows (
UserID,
TheOffset
)
as
(
select
UserID,
row_number() over (partition by UserID order by CreationDate)
- DATEDIFF(DAY, 0, CreationDate) as TheOffset
from
tablename
),
ConsecutiveCounts(
UserID,
ConsecutiveDays
)
as
(
select
UserID,
count(*) as ConsecutiveDays
from
numberedrows
group by
UserID,
TheOffset
)
select
UserID,
SUM(ConsecutiveDays / @period_length) AS distinct_n_day_periods
from
ConsecutiveCounts
group by
UserID
唯一真正的区别是我获取 Rob 的结果,然后通过另一个 GROUP BY...
【讨论】:
【参考方案2】:所以 - 我将从最后一个问题的查询开始,它列出了连续几天的每次运行。然后我将按用户 ID 和 NumConsecutiveDays 对其进行分组,以计算这些用户的运行天数。
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, NumConsecutiveDays, count(*) as NumOfRuns
from runsOfDays
group by UserID, NumConsecutiveDays
;
当然,如果您想过滤它以仅考虑特定长度的运行,则在最后一个查询中输入“where NumConsecutiveDays >= @days”。
现在,如果您想将 16 天的跑步计为 3 次 5 天的跑步,那么每次跑步都将计为其中的 NumConsecutiveDays / @runlength(每个整数都会向下舍入)。所以现在不要只计算每个有多少,而是使用 SUM。您可以使用上面的查询并使用 SUM(NumOfRuns * NumConsecutiveDays / @runlength),但是如果您理解逻辑,那么下面的查询会更容易一些。
with numberedrows as
(
select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, sum(NumConsecutiveDays / @runlength) as NumOfRuns
from runsOfDays
where NumConsecutiveDays >= @runlength
group by UserID
;
希望这会有所帮助,
罗伯
【讨论】:
【参考方案3】:这非常适合我拥有的测试数据。
DECLARE @days int
SET @days = 30
SELECT DISTINCT l.UserId, (datediff(d,l.CreationDate, -- Get first date in contiguous range
(
SELECT min(a.CreationDate ) as CreationDate
FROM UserHistory a
LEFT OUTER JOIN UserHistory b
ON a.CreationDate = dateadd(day, -1, b.CreationDate ) AND
a.UserId = b.UserId
WHERE b.CreationDate IS NULL AND
a.CreationDate >= l.CreationDate AND
a.UserId = l.UserId
) )+1)/@days as cnt
INTO #cnttmp
FROM UserHistory l
LEFT OUTER JOIN UserHistory r
ON r.CreationDate = dateadd(day, -1, l.CreationDate ) AND
r.UserId = l.UserId
WHERE r.CreationDate IS NULL
ORDER BY l.UserId
SELECT UserId, sum(cnt)
FROM #cnttmp
GROUP BY UserId
HAVING sum(cnt) > 0
【讨论】:
以上是关于SQL 来确定不同的连续访问天数?的主要内容,如果未能解决你的问题,请参考以下文章