Postgres distinct union 仅适用于特定列

Posted

技术标签:

【中文标题】Postgres distinct union 仅适用于特定列【英文标题】:Postgres distinct union only for specific columns 【发布时间】:2018-09-28 18:29:20 【问题描述】:

我有两组数据,其中一组是动态生成的。

如果我离开 state 列,它会完美运行,因为该列并不真正存在,我的问题是如何忽略 UNION 的列,以便它结合两个数据集(因为它是相同的作为 UNION ALL)。例如,我更喜欢第一个表,并希望忽略第二个数据集中的任何行,如果它们存在于第一个表中。

SELECT event_id, start_at, state
FROM event_logs
WHERE start_at BETWEEN current_date AND current_date + interval '3 weeks'
UNION
SELECT id event_id,
GENERATE_SERIES(date_trunc('week', current_date)::date + (extract(isodow from start_at)::int - 1) + start_at::time, current_date + interval '3 weeks', '1 week'::INTERVAL) AS start_at,
'draft' AS state
FROM events

更新,也试过了:

WITH future_logs AS (
 SELECT id event_id,
 GENERATE_SERIES(date_trunc('week', current_date)::date + (extract(isodow from start_at)::int -  1) + start_at::time, current_date + interval '3 weeks', '1 week'::INTERVAL) AS start_at,
 'draft' AS state
 FROM events)

SELECT future_logs.event_id, future_logs.start_at, future_logs.state
FROM future_logs
LEFT JOIN event_logs ON future_logs.event_id = event_logs.event_id AND future_logs.start_at = event_logs.start_at
WHERE event_logs.start_at BETWEEN current_date AND current_date + interval '3 weeks'

但是得到的结果太少了 77 vs ~1000 预期。

【问题讨论】:

将 UNION 的第二部分转换为日历表(或视图,或 CTE)并将 event_logs 表左连接到它。 (或:使用 UNION ALL,并在第二部分添加 WHERE NOT EXISTS 子句) @wildplasser 试过了...似乎没有按预期工作。 【参考方案1】:

只需将NOT EXISTS() 添加到第二条腿,您可以使用UNION ALL 来避免排序/合并。


SELECT event_id, start_at, state
FROM event_logs
WHERE start_at BETWEEN current_date AND current_date + interval '3 weeks'

UNION ALL

SELECT id AS event_id
        , generate_series(date_trunc('week', current_date)::date + (extract(isodow from start_at)::int - 1) + start_at::time
                , current_date + interval '3 weeks'
                , '1 week'::INTERVAL) AS start_at
        , 'draft' AS state
FROM events ev
WHERE NOT EXISTS ( SELECT*
        FROM event_logs nx
        WHERE nx.event_id =ev.id
        AND nx.start_at BETWEEN current_date AND current_date + interval '3 weeks'      )

        ;

【讨论】:

而这个 WHERE NOT 原因会删除 UNION 中的第二部分(不是第一个),对吗?如果是这样,那就完美了!【参考方案2】:
select DISTINCT ON (date_day) date_day, state from(
SELECT day::date as date_day, null as state
FROM generate_series(now()- interval '2 week'
, now()
, interval '1 day') day
UNION ALL
select distinct
  date_trunc('day',e.updated_at) as date_day,
  max(des.state) over (partition by date_trunc('day',des.updated_at)) as state
from device_event as des where e.id=49 and e.updated_at >= now() - interval '2 week'
) dba order by 1

【讨论】:

【参考方案3】:

我会在您的 UNION 查询中添加另一列 taborder 以确保行的简单排序并以下列方式使用窗口函数 row_number() over(...)

SELECT
  event_id,
  start_at,
  state
FROM (
  SELECT
    event_id,
    start_at,
    state, 
    row_number(*) OVER (PARTITION BY event_id, start_at ORDER BY taborder) AS rownum 
  FROM (
    SELECT
      event_id,
      start_at,
      state,
      1 AS taborder 
    FROM original_table
    
    UNION
    
    SELECT
      event_id,
      start_at,
      state,
      2 AS taborder 
    FROM draft_table
  ) src0
) src1 
WHERE rownum = 1
ORDER BY 1, 2, 3

【讨论】:

以上是关于Postgres distinct union 仅适用于特定列的主要内容,如果未能解决你的问题,请参考以下文章

SQL里 distinct 和 union 的区别?

sql中,只使用union和先union all再distinct,两种方式哪个效率高?

MYSQL UNION DISTINCT

UNION DISTINCT

SQL Server 查询:Union vs Distinct union 所有性能

Postgres DISTINCT 与 DISTINCT ON 有啥区别?