通过 sql 计算用户看到的唯一项目
Posted
技术标签:
【中文标题】通过 sql 计算用户看到的唯一项目【英文标题】:Calculate unique items seen by users via sql 【发布时间】:2020-08-15 10:58:20 【问题描述】:我需要帮助来解决下一个案例。
用户想要查看的数据可以通过分页请求访问,然后这些请求以以下形式存储在数据库中:
+----+---------+-------+--------+
| id | user id | first | amount |
+----+---------+-------+--------+
| 1 | 1 | 0 | 5 |
| 2 | 1 | 10 | 10 |
| 3 | 1 | 10 | 5 |
| 4 | 1 | 15 | 10 |
| 5 | 2 | 0 | 10 |
| 6 | 2 | 0 | 5 |
| 7 | 2 | 10 | 5 |
+----+---------+-------+--------+
表格按用户id asc、first asc、数量desc排序。
任务是编写 SQL 语句,计算用户看到的唯一数据总量。
对于第一个用户,总金额必须为 20,因为 id=1 的请求返回了前 5 个项目,id=2 的请求返回了另外 10 个项目。 id=3 的请求返回已经被 id=2 的请求“看到”的数据。 id=4 的请求与 id=2 相交,但仍返回 5 条“未见过”的数据。
第二个用户的总金额必须是15。
作为 SQL 语句的结果,我应该得到下一个输出:
+---------+-------+
| user id | total |
+---------+-------+
| 1 | 20 |
+---------+-------+
| 2 | 15 |
+---------+-------+
我使用的是 mysql 5.7,因此我无法使用窗口函数。我已经坚持了一天的任务,但仍然无法获得所需的输出。如果无法使用此设置,我将最终在应用程序代码中计算结果。对于解决此任务的任何建议或帮助,我将不胜感激,谢谢!
【问题讨论】:
【参考方案1】:这是一种间隙和孤岛问题。在这种情况下,使用累积最大值来确定一个请求是否与前一个请求相交。如果不是,那就是相邻请求的“孤岛”的开始。开始的累积总和分配一个“岛”,然后聚合计算每个岛。
所以,这些岛屿看起来像这样:
select userid, min(first), max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp;
然后您希望将其与用户 ID 相加,这样就多了一层聚合:
with islands as (
select userid, min(first) as first, max(first + amount) as last
from (select t.*,
sum(case when prev_last >= first then 0 else 1 end) over
(partition by userid order by first) as grp
from (select t.*,
max(first + amount) over (partition by userid order by first range between unbounded preceding and 1 preceding) as prev_last
from t
) t
) t
group by userid, grp
)
select userid, sum(last - first) as total
from islands
group by userid;
Here 是一个 dbfiddle。
【讨论】:
感谢详细说明,我会考虑更新环境使用mysql 8.0【参考方案2】:这个逻辑类似于 Gordon 的逻辑,但也可以在旧版本的 MySQL 上运行。
select userid
-- overall length minus gaps
,max(maxlast)-min(minfirst) + sum(gaplen) as total
from
(
select userid
,prevlast
,min(first) as minfirst -- first of group
,max(last) as maxlast -- last of group
-- if there was a gap, calculate length of gap
,min(case when prevlast < first then prevlast - first else 0 end) as gaplen
from
(
select t.*
,first + amount as last -- last value in range
,( -- maximum end of all previous rows
select max(first + amount)
from t as t2
where t2.userid = t.userid
and t2.first < t.first
) as prevlast
from t
) as dt
group by userid, prevlast
) as dt
group by userid
order by userid
见fiddle
【讨论】:
谢谢,这是我 mysql 5.7 所需要的。以上是关于通过 sql 计算用户看到的唯一项目的主要内容,如果未能解决你的问题,请参考以下文章