在子查询中使用 distinct on

Posted 2023-02-16

技术标签:

【中文标题】在子查询中使用 distinct on【英文标题】：Using distinct on in subqueries 【发布时间】：2020-12-29 18:30:10 【问题描述】：

我注意到在 PostgreSQL 中以下两个查询输出不同的结果：

select a.*
from (
    select distinct on (t1.col1)
        t1.*
    from t1
    order by t1.col1, t1.col2
) a
where a.col3 = value
;

create table temp as
select distinct on (t1.col1)
    t1.*
from t1
order by t1.col1, t1.col2
;
select temp.*
from temp
where temp.col3 = value
;

我猜这与在子查询中使用 distinct on 有关。

在子查询中使用distinct on 的正确方法是什么？例如。如果我不使用where 语句，我可以使用它吗？或者像

这样的查询

(
select distinct on (a.col1)
    a.*
from a
)
union
(
select distinct on (b.col1)
    b.*
from b
)

【问题讨论】：

请提供一个最小可重复的例子：样本数据和期望的结果，作为表格文本。恕我直言，两者都应该返回相同的结果。 【参考方案1】：

在正常情况下，两个示例应该返回相同的结果。

我怀疑您得到了不同的结果，因为您的 distinct on 子查询的 order by 子句不是确定性的。也就是说，t1 中可能有几行共享相同的col1 和col2。

如果order by 中的列不能唯一标识每一行，那么数据库必须自行决定将哪一行保留在结果集中：因此，结果不稳定，这意味着连续执行相同的查询可能会产生不同的结果。

确保您的order by 子句是确定性的（例如通过在子句中添加更多列），并且不再出现此问题。

【讨论】：

然后多次运行查询 1 应该也会得到不同的结果：

(     select distinct on (t1.col1)         t1.*     from t1     order by t1.col1, t1.col2 )  except (     select distinct on (t1.col1)         t1.*     from t1     order by t1.col1, t1.col2 )

以上是关于在子查询中使用 distinct on的主要内容，如果未能解决你的问题，请参考以下文章

在不使用子查询的情况下使用 SELECT DISTINCT ON 计算总行数

如何采用按单独列排序的 DISTINCT ON 子查询并使其快速？

使用子查询可提升 COUNT DISTINCT 速度 50 倍

为啥子查询中的 distinct on 会损害 PostgreSQL 的性能？

oracle的full join关联的表限制条件在on后面与限制在子查询的结果是不一样

SQL Server - 在子查询中使用主查询中的列