Hacker Rank SQL 问题:15 天的 SQL,如何使用 AND 选择查询

Posted

技术标签:

【中文标题】Hacker Rank SQL 问题:15 天的 SQL,如何使用 AND 选择查询【英文标题】:Hacker Rank SQL Problem: 15 Days of SQL, how to select query work with AND 【发布时间】:2021-02-15 18:48:51 【问题描述】:

考虑以下几点:

create table submissions (
  submission_date date,
  submission_id int,
  hacker_id int,
  score int
  
);

create table hackers (
  hacker_id int,
  name varchar(20)
);

insert into submissions values
("2016-03-01", 8494, 20703, 0),("2016-03-01", 22403, 53473,15),
("2016-03-01",23965,79722,60),("2016-03-01",30173,36396,70),
("2016-03-02",34928,20703,0),("2016-03-02",38740,15758,60),
("2016-03-02",42769,79722,25),("2016-03-02",44364,79722,60),
("2016-03-03",45440,20703,0),("2016-03-03",49050,36396,70),
("2016-03-03",50273,79722,5),("2016-03-04",50344,20703,0),
("2016-03-04",51360,44065,90),("2016-03-04",54404,53473,65),
("2016-03-04",61533,79722,45),("2016-03-05",72852,20703,0),
("2016-03-05",74546,38289,0),("2016-03-05",76487,62529,0),
("2016-03-05",82439,36396,10),("2016-03-05",90006,36396,40),
("2016-03-06",90404,20703,0);

create table colleges (
  college_id int,
  contest_id int
);

insert into hackers values 
(15758, 'Rose'),(20703, 'Angela'),
(36396,'Frank'),(38289, 'Patrick'),
(44065, 'Lisa'),(53473,'Kimberly'),
(62529, 'Bonnie'),(79722, 'Michael');

对于这个 HackerRank quiz:

Julia 举办了“15 天 SQL 学习”竞赛。比赛开始日期为 2016 年 3 月 1 日,结束日期为 2016 年 3 月 15 日。

编写查询以打印至少 1 的唯一黑客总数 每天提交(从比赛的第一天开始),并找到每天提交最多提交次数的黑客的hacker_id和名称。如果不止一个这样的黑客有提交的最大数量,打印最低的hacker_id。查询应打印比赛每一天的此信息,按日期排序。

这是我想了解的解决方案:

SELECT submission_date, 
        (
    SELECT COUNT(DISTINCT hacker_id)
    FROM Submissions AS SUB2
    WHERE SUB2.submission_date = SUB1.submission_date AND 
    (SELECT COUNT(DISTINCT submission_date)
    FROM Submissions AS SUB3
    WHERE (SUB3.hacker_id = SUB2.hacker_id) AND 
    (SUB3.submission_date < SUB1.submission_date)) 
    = DATEDIFF(SUB1.submission_date, '2016-03-01' )
        ),
        
        (SELECT hacker_id FROM Submissions SUB4
        WHERE SUB4.submission_date = SUB1.submission_date
        GROUP BY hacker_id
        ORDER BY COUNT(submission_id) DESC, hacker_id LIMIT 1) AS HID,
        (SELECT name FROM Hackers
        WHERE hacker_id = HID)
FROM 
(SELECT DISTINCT(submission_date)
FROM Submissions) AS SUB1

我无法理解 2 个部分:

第 1 部分

SELECT COUNT(DISTINCT hacker_id)
        FROM Submissions AS SUB2
        WHERE SUB2.submission_date = SUB1.submission_date AND 
        (SELECT COUNT(DISTINCT submission_date)
        FROM Submissions AS SUB3
        WHERE (SUB3.hacker_id = SUB2.hacker_id) AND 
        (SUB3.submission_date < SUB1.submission_date)) 
        = DATEDIFF(SUB1.submission_date, '2016-03-01' )
            )

以上代码问题: 这部分是如何工作的:

SELECT COUNT(DISTINCT hacker_id FROM Submissions AS SUB2 WHERE
SUB2.submission_date = SUB1.submission_date 

适用于

(SELECT 
COUNT(DISTINCT submission_date) FROM Submissions AS SUB3 WHERE 
(SUB3.hacker_id = SUB2.hacker_id) AND  (SUB3.submission_date < 
SUB1.submission_date))  = DATEDIFF(SUB1.submission_date, 
'2016-03-01' ))

第一部分为给定的提交日期带来所有唯一的hacker_id,第二部分检查hacker_id是否在该日期提交了一致的提交,但是SQL如何确保只检查第一部分中存在的hacker_id(在AND之前)在第二部分(在 AND 之后)

您能否举例说明这两个查询是如何协同工作的?

第 2 部分 对于这部分

 (SELECT hacker_id FROM Submissions SUB4 WHERE
 SUB4.submission_date = SUB1.submission_date GROUP BY hacker_id 
 ORDER BY COUNT(submission_id) DESC, hacker_id LIMIT 1) AS HID

如何只检查直到当前日期为止在每个日期提交一致的hacker_id,然后对这些hacker_ids 提交进行分组,然后选择提交次数最多的最低hacker_id?

MRE 这个问题。

【问题讨论】:

DISTINCT 不是函数,所以SELECT DISTINCT(submission_date) 有点傻。 这里我们从“submission_date”列中选择不同/唯一的值 那就是SELECT DISTINCT submission_date 现在我明白了!因为它不是一个函数,所以列名不需要在括号内。谢谢@Strawberry 【参考方案1】:

这不是你问的,但这些天,我们可能会做这样的事情......

请注意,我的日期范围和数据集与您的略有不同...

drop table if exists hackers;

create table hackers (
  hacker_id serial primary key,
  name varchar(20)
);

insert into hackers values 
(15758, 'Rose'),
(20703, 'Angela'),
(36396,'Frank'),
(38289, 'Patrick'),
(44065, 'Lisa'),
(53473,'Kimberly'),
(62529, 'Bonnie'),
(79722, 'Michael'),
(10101, 'Geoff');

drop table if exists submissions;

create table submissions (
  submission_date date,
  submission_id serial primary key,
  hacker_id int,
  score int
  
);

insert into submissions values
("2016-03-01", 8494, 20703, 0),
("2016-03-01", 22403, 53473,15),
("2016-03-01",23965,79722,60),
("2016-03-01",30173,36396,70),
("2016-03-02",34928,20703,0),
("2016-03-02",38740,15758,60),
("2016-03-02",42769,79722,25),
("2016-03-02",44364,79722,60),
("2016-03-03",45440,20703,0),
("2016-03-03",49050,36396,70),
("2016-03-03",50273,79722,5),
("2016-03-04",50344,20703,0),
("2016-03-04",51360,44065,90),
("2016-03-04",54404,53473,65),
("2016-03-04",61533,79722,45),
("2016-03-05",72852,20703,0),
("2016-03-05",74546,38289,0),
("2016-03-05",76487,62529,0),
("2016-03-05",82439,36396,10),
("2016-03-05",90006,36396,40),
("2016-03-06",90404,20703,0),
("2016-03-01",90405,10101,12),
("2016-03-02",90406,10101,0),
("2016-03-03",90407,10101,15),
("2016-03-04",90409,10101,60),
("2016-03-05",90410,10101,70),
("2016-03-06",90411,10101,0);


WITH RECURSIVE cte (n) AS
(
  SELECT '2016-03-01' 
  UNION ALL
  SELECT n + INTERVAL 1 DAY FROM cte WHERE n < '2016-03-06'
)
SELECT h.*
  FROM cte
  JOIN hackers h
  JOIN submissions s
    ON s.submission_date = n
   AND s.hacker_id = h.hacker_id
 GROUP 
    BY hacker_id 
HAVING COUNT(DISTINCT s.submission_date) = 6
 ORDER
    BY hacker_id
 LIMIT 1
;

+-----------+-------+
| hacker_id | name  |
+-----------+-------+
|     10101 | Geoff |
+-----------+-------+

【讨论】:

以上是关于Hacker Rank SQL 问题:15 天的 SQL,如何使用 AND 选择查询的主要内容,如果未能解决你的问题,请参考以下文章

SQL — 获取 A 列日期与 B 列日期相差 7 天的所有行

SQL RANK() over PARTITION 在连接表上

使用 SQL Server Rank 函数对行进行排名而不跳过排名号

用 sqlserver 怎样查询出 数据表里 某月上班连续打卡15天的人

为啥使用 rank() 窗口函数会破坏解析器?

使用 SQL 获取表中每个用户最近 n 天的活动 [关闭]