找到每个球员最长的连续得分
Posted
技术标签:
【中文标题】找到每个球员最长的连续得分【英文标题】:Find the longest streak of perfect scores per player 【发布时间】:2019-10-28 05:09:30 【问题描述】:我在 PostgreSQL 数据库中使用 ORDER BY player_id ASC, time ASC
查询 SELECT
得到以下结果:
player_id points time
395 0 2018-06-01 17:55:23.982413-04
395 100 2018-06-30 11:05:21.8679-04
395 0 2018-07-15 21:56:25.420837-04
395 100 2018-07-28 19:47:13.84652-04
395 0 2018-11-27 17:09:59.384-05
395 100 2018-12-02 08:56:06.83033-05
399 0 2018-05-15 15:28:22.782945-04
399 100 2018-06-10 12:11:18.041521-04
454 0 2018-07-10 18:53:24.236363-04
675 0 2018-08-07 20:59:15.510936-04
696 0 2018-08-07 19:09:07.126876-04
756 100 2018-08-15 08:21:11.300871-04
756 100 2018-08-15 16:43:08.698862-04
756 0 2018-08-15 17:22:49.755721-04
756 100 2018-10-07 15:30:49.27374-04
756 0 2018-10-07 15:35:00.975252-04
756 0 2018-11-27 19:04:06.456982-05
756 100 2018-12-02 19:24:20.880022-05
756 100 2018-12-04 19:57:48.961111-05
我试图找到每个玩家最长的连胜纪录points = 100
,决胜局是最近开始的连胜纪录。我还需要确定该球员最长连胜纪录的开始时间。预期的结果是:
player_id longest_streak time_began
395 1 2018-12-02 08:56:06.83033-05
399 1 2018-06-10 12:11:18.041521-04
756 2 2018-12-02 19:24:20.880022-05
【问题讨论】:
你应该在这里找到解决方案,带有窗口功能:postgresql.org/docs/9.1/tutorial-window.html 连胜是否被其他玩家的行打断了?另外:你的 Postgres 版本? 【参考方案1】:这是一个gap and island问题,你可以尝试使用SUM
条件加重函数和窗口函数,得到gap number。
然后再次使用MAX
和COUNT
窗口函数。
查询 1:
WITH CTE AS (
SELECT *,
SUM(CASE WHEN points = 100 THEN 1 END) OVER(PARTITION BY player_id ORDER BY time) -
SUM(1) OVER(ORDER BY time) RN
FROM T
)
SELECT player_id,
MAX(longest_streak) longest_streak,
MAX(cnt) longest_streak
FROM (
SELECT player_id,
MAX(time) OVER(PARTITION BY rn,player_id) longest_streak,
COUNT(*) OVER(PARTITION BY rn,player_id) cnt
FROM CTE
WHERE points > 0
) t1
GROUP BY player_id
Results:
| player_id | longest_streak | longest_streak |
|-----------|-----------------------------|----------------|
| 756 | 2018-12-04T19:57:48.961111Z | 2 |
| 399 | 2018-06-10T12:11:18.041521Z | 1 |
| 395 | 2018-12-02T08:56:06.83033Z | 1 |
【讨论】:
【参考方案2】:执行此操作的一种方法是查看上一个和下一个非 100 结果之间的行数。要获得条纹的长度:
with s as (
select s.*,
row_number() over (partition by player_id order by time) as seqnum,
count(*) over (partition by player_id) as cnt
from scores s
)
select s.*,
coalesce(next_seqnum, cnt + 1) - coalesce(prev_seqnum, 0) - 1 as length
from (select s.*,
max(seqnum) filter (where score <> 100) over (partition by player_id order by time) as prev_seqnum,
max(seqnum) filter (where score <> 100) over (partition by player_id order by time) as next_seqnum
from s
) s
where score = 100;
然后您可以合并其他条件:
with s as (
select s.*,
row_number() over (partition by player_id order by time) as seqnum,
count(*) over (partition by player_id) as cnt
from scores s
),
streaks as (
select s.*,
coalesce(next_seqnum - prev_seqnum) over (partition by player_id) as length,
max(next_seqnum - prev_seqnum) over (partition by player_id) as max_length,
max(next_seqnum) over (partition by player_id) as max_next_seqnum
from (select s.*,
coalesce(max(seqnum) filter (where score <> 100) over (partition by player_id order by time), 0) as prev_seqnum,
coalesce(max(seqnum) filter (where score <> 100) over (partition by player_id order by time), cnt + 1) as next_seqnum
from s
) s
where score = 100
)
select s.*
from streaks s
where length = max_length and
next_seqnum = max_next_seqnum;
【讨论】:
【参考方案3】:确实是gaps-and-islands 问题。
假设:
“连胜”不会被其他玩家的行打断。 所有列都定义为NOT NULL
。 (否则你必须做更多。)
这应该是最简单最快的,因为它只需要两个快速row_number()
window functions:
SELECT DISTINCT ON (player_id)
player_id, count(*) AS seq_len, min(ts) AS time_began
FROM (
SELECT player_id, points, ts
, row_number() OVER (PARTITION BY player_id ORDER BY ts)
- row_number() OVER (PARTITION BY player_id, points ORDER BY ts) AS grp
FROM tbl
) sub
WHERE points = 100
GROUP BY player_id, grp -- omit "points" after WHERE points = 100
ORDER BY player_id, seq_len DESC, time_began DESC;
db小提琴here
使用列名ts
代替time
,这是标准SQL 中的reserved word。它在 Postgres 中是允许的,但有一些限制,将其用作标识符仍然是个坏主意。
“诀窍”是减去行号,以便每个(player_id, points)
的连续行属于同一组 (grp
)。 然后过滤得分为 100 分的人,按组汇总并仅返回每个玩家最长、最近的结果。
该技术的基本解释:
我们可以在同一个SELECT
中使用GROUP BY
和DISTINCT ON
,GROUP BY
被应用之前 DISTINCT ON
。考虑SELECT
查询中的事件顺序:
关于DISTINCT ON
:
【讨论】:
【参考方案4】:这是我的答案
select
user_id,
non_streak,
streak,
ifnull(non_streak,streak) strk,
max(time) time
from (
Select
user_id,time,
points,
lag(points) over (partition by user_id order by time) prev_point,
case when points + lag(points) over (partition by user_id order by time) = 100 then 1 end as non_streak,
case when points + lag(points) over (partition by user_id order by time) > 100 then 1 end as streak
From players
) where ifnull(non_streak,streak) is not null
group by 1,2,3
order by 1,2
) group by user_id`
【讨论】:
请考虑添加how and why this solves the problem的简要说明。这将有助于读者更好地理解您的解决方案。以上是关于找到每个球员最长的连续得分的主要内容,如果未能解决你的问题,请参考以下文章