需要使用 group by 和 having 子句验证一致的结果集

Posted 2023-02-16

技术标签:

【中文标题】需要使用 group by 和 having 子句验证一致的结果集【英文标题】：Need validation of consistent result set with group by and having clause 【发布时间】：2021-11-22 16:31:36 【问题描述】：

我有一张如下表

CREATE TABLE `zpost` (
  `post_id` int(10) UNSIGNED NOT NULL,
  `topic_id` int(10) UNSIGNED NOT NULL DEFAULT 0,
  `post_subject` varchar(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin;

数据集为

INSERT INTO `zpost` (`post_id`, `topic_id`, `post_subject`) VALUES
(44, 33, 'New topic by new user'),
(45, 33, 'Re: New topic by new user'),
(47, 33, 'Re: New topic by new user'),
(46, 34, 'New Topic by James on 1/2'),
(48, 35, 'Sep 29th new topic'),
(49, 35, 'Re: Sep 29th new topic'),
(50, 35, 'Re: Sep 29th new topic'),
(51, 36, 'Another Sep topic');

和索引（与问题无关，但在这里）

ALTER TABLE `zpost`
  ADD PRIMARY KEY (`post_id`),
  ADD KEY `topic_id` (`topic_id`);

最后是 SQL

SELECT * FROM `zpost` group by `topic_id` having min(`topic_id`);

终于输出了

|post_id|topic_id|post_subject              |
+-------+--------+--------------------------+
|     44|      33|New topic by new user     |
|     46|      34|New Topic by James on 1/2 |
|     48|      35|Sep 29th new topic        |
|     51|      36|Another Sep topic         |

我只想要给定 topic_id 的最小 post_id - 第一个主题记录。我似乎默认情况下得到了这一点。不确定这是否只是数据库决定提供行的方式，或者这是否是一致的顺序。有问题的数据库是 MariaDB。我也尝试在数据库中以相反的顺序插入数据，如下所示

INSERT INTO `zpost` (`post_id`, `topic_id`, `post_subject`) VALUES
(51, 36, 'Another Sep topic'),
(50, 35, 'Re: Sep 29th new topic'),
(49, 35, 'Re: Sep 29th new topic'),
(48, 35, 'Sep 29th new topic'),
(46, 34, 'New Topic by James on 1/2'),
(47, 33, 'Re: New topic by new user'),
(45, 33, 'Re: New topic by new user'),
(44, 33, 'New topic by new user');

而且我仍然得到了我想要的结果，这是个好消息，不需要采取进一步的行动。但不确定为什么以及为了完整起见，如果我想要最后一行（最大 post_id），我将如何更改 SQL 以使该行与每个 topic_id 相关联？有人会认为将 min 更改为 max 会解决这个问题，但不是！对于这个查询，我也得到了相同的结果。

SELECT * FROM `zpost` group by `topic_id` having max(`topic_id`);

【问题讨论】：

【参考方案1】：

首先，在关系数据库中，表中的行没有任何固有的顺序。插入、更新或删除它们的顺序无关紧要。表格代表无序的包行。

您可以使用ROW_NUMBER() 来识别您想要的行。

要为每个topic_id 获取旧的post_id，您可以这样做：

select post_id, topic_id, post_subject
from (
  select *, row_number() over(partition by topic_id order by post_id) as rn
  from zpost
) x
where rn = 1

结果：

 post_id  topic_id  post_subject              
 -------- --------- ------------------------- 
 44       33        New topic by new user     
 46       34        New Topic by James on 1/2 
 48       35        Sep 29th new topic        
 51       36        Another Sep topic

请参阅DB Fiddle - ASC 的运行示例。

要为每个topic_id 获取最新的post_id，您可以这样做：

select post_id, topic_id, post_subject
from (
  select *, row_number() over(partition by topic_id order by post_id desc) as rn
  from zpost
) x
where rn = 1

结果：

 post_id  topic_id  post_subject              
 -------- --------- ------------------------- 
 47       33        Re: New topic by new user 
 46       34        New Topic by James on 1/2 
 50       35        Re: Sep 29th new topic    
 51       36        Another Sep topic

在DB Fiddle - DESC查看运行示例。

【讨论】：

关于 RDBMS 没有固有的排序，您是绝对正确的。然而，这并不是真正的（输出）排序，而是关于它如何始终如一地选择第一个主题行。它所做的事实很有趣。当然，子查询可以做到这一点，但我想保持简单，因为简单已经有效。我写的另一个子查询是： select post_id, topic_id, post_subject from zpost where post_id in ( select max(post_id) from zpost group by topic_id ) @Senthil 我不明白你的查询版本是如何工作的，所以我看不出这个解决方案有多稳定。有时当表很小时，所有数据都驻留在磁盘上的单个 I/O 块中，或者整个主索引不跨越多个树节点；通常，您会在这些小情况下看到特殊行为，一旦引擎开始使用更多块或节点，您就不会看到这种行为。我会小心这个解决方案，我会用更多的行来测试它，特别是在插入和删除许多随机行之后。同意。非常感谢。

以上是关于需要使用 group by 和 having 子句验证一致的结果集的主要内容，如果未能解决你的问题，请参考以下文章