SQL 来确定不同的连续访问天数?

Posted

技术标签:

【中文标题】SQL 来确定不同的连续访问天数?【英文标题】:SQL to determine distinct periods of sequential days of access? 【发布时间】:2009-07-24 10:27:56 【问题描述】:

Jeff 最近询问了this question 并得到了一些很好的答案。

Jeff 的问题围绕着查找已连续 (n) 天登录系统的用户。使用数据库表结构如下:

Id UserId CreationDate ------ ------ ------------ 750997 12 2009-07-07 18:42:20.723 750998 15 2009-07-07 18:42:20.927 751000 19 2009-07-07 18:42:22.283

请先阅读the original question,然后再...

我对确定用户有多少个不同 (n) 天周期的问题很感兴趣。

是否可以设计一个快速的 SQL 查询来返回用户列表以及他们拥有的不同 (n) 天数?

编辑:根据下面的评论如果有人连续 2 天,然后是间隔,然后是连续 4 天,然后是间隔,然后是连续 8 天。这将是 3 个“不同的 4 天期间”。 8 天的周期应算作两个连续的 4 天周期。

【问题讨论】:

您能否详细说明您对“不同 (n) 天周期”的定义?如果某人有连续 2 天,然后是间隔,然后是连续 4 天,然后是间隔,然后是连续 8 天,那是 2 个“不同的 4 天周期”还是 3 个? (将 8 天的周期算作两个连续的 4 天周期?) 我将编辑问题,但使用您的示例将是 3 个不同的 4 天时间段。 【参考方案1】:

我的答案似乎没有出现...

我会再试一次...

Rob Farley 对原始问题的回答有一个方便的好处,即包括连续天数。

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset

使用整数除法,只需将连续天数相除即可得出整个连续周期所涵盖的“不同 (n) 天周期”的数量... - 2 / 4 = 0 - 4 / 4 = 1 - 8 / 4 = 2 - 9 / 4 = 2 - 等等等等

所以这是我对 Rob 对您需求的回答的看法... (我真的很喜欢Rob's answer,去看看解释吧,这很有启发性!)

with
    numberedrows (
        UserID,
        TheOffset
    )
as
(
    select
        UserID,
        row_number() over (partition by UserID order by CreationDate)
            - DATEDIFF(DAY, 0, CreationDate) as TheOffset
    from
        tablename
),
    ConsecutiveCounts(
        UserID,
        ConsecutiveDays
    )
as
(
    select
        UserID,
        count(*) as ConsecutiveDays
    from
        numberedrows
    group by
        UserID,
        TheOffset
)
select
    UserID,
    SUM(ConsecutiveDays / @period_length) AS distinct_n_day_periods
from
    ConsecutiveCounts
group by
    UserID

唯一真正的区别是我获取 Rob 的结果,然后通过另一个 GROUP BY...

【讨论】:

【参考方案2】:

所以 - 我将从最后一个问题的查询开始,它列出了连续几天的每次运行。然后我将按用户 ID 和 NumConsecutiveDays 对其进行分组,以计算这些用户的运行天数。

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, NumConsecutiveDays, count(*) as NumOfRuns
from runsOfDays
group by UserID, NumConsecutiveDays
;

当然,如果您想过滤它以仅考虑特定长度的运行,则在最后一个查询中输入“where NumConsecutiveDays >= @days”。

现在,如果您想将 16 天的跑步计为 3 次 5 天的跑步,那么每次跑步都将计为其中的 NumConsecutiveDays / @runlength(每个整数都会向下舍入)。所以现在不要只计算每个有多少,而是使用 SUM。您可以使用上面的查询并使用 SUM(NumOfRuns * NumConsecutiveDays / @runlength),但是如果您理解逻辑,那么下面的查询会更容易一些。

with numberedrows as
(
        select row_number() over (partition by UserID order by CreationDate) - cast(CreationDate-0.5 as int) as TheOffset, CreationDate, UserID
        from tablename
)
,
runsOfDay as
(
select min(CreationDate), max(CreationDate), count(*) as NumConsecutiveDays, UserID
from numberedrows
group by UserID, TheOffset
)
select UserID, sum(NumConsecutiveDays / @runlength) as NumOfRuns
from runsOfDays
where NumConsecutiveDays >= @runlength
group by UserID
;

希望这会有所帮助,

罗伯

【讨论】:

【参考方案3】:

这非常适合我拥有的测试数据。

DECLARE @days int
SET @days = 30

SELECT DISTINCT l.UserId, (datediff(d,l.CreationDate, -- Get first date in contiguous range
(
    SELECT min(a.CreationDate ) as CreationDate
    FROM UserHistory a
        LEFT OUTER JOIN UserHistory b 
            ON a.CreationDate = dateadd(day, -1, b.CreationDate ) AND
            a.UserId = b.UserId
    WHERE b.CreationDate IS NULL AND
        a.CreationDate >= l.CreationDate AND
        a.UserId = l.UserId
) )+1)/@days as cnt
INTO #cnttmp
FROM UserHistory l
    LEFT OUTER JOIN UserHistory r 
        ON r.CreationDate = dateadd(day, -1, l.CreationDate ) AND
        r.UserId = l.UserId
WHERE r.CreationDate IS NULL
ORDER BY l.UserId

SELECT UserId, sum(cnt)
FROM #cnttmp
GROUP BY UserId
HAVING sum(cnt) > 0

【讨论】:

以上是关于SQL 来确定不同的连续访问天数?的主要内容,如果未能解决你的问题,请参考以下文章

在 SQL 中计算连续班次和天数

SQL:连续天数的计算方法

spark sql 连续登录最大天数

SQL经典问题 找出连续日期及连续的天数

SQL 计算连续天数

sql 连续活跃天数