如何从分析查询中排除选择分区?

Posted

技术标签:

【中文标题】如何从分析查询中排除选择分区?【英文标题】:How to exclude select partitions from an analytic query? 【发布时间】:2014-09-26 02:46:16 【问题描述】:

这是我的场景:

对于 ID=1,201111 的 cheese_year_seqno,其中一行的供应商代码为 XX,所以我想 排除所有 201111 seqno,但保留 201222 行可用于排名。 如果在给定的 year_seqno 中没有供应商 XX,则使所有行都可用于排名。

由于 ID=2 没有 XX 的供应商代码,它的所有行都应该可用于排名。

with cheese_row as
(
select 1 as cheese_id, '201111' as cheese_year_seqno, 2 as cheese_lot, 1 as cheese_batch, 'AA' as cheese_vendor,trunc(sysdate-356) as cheese_batch_date from dual union all
select 1 as cheese_id, '201111' as cheese_year_seqno, 2 as cheese_lot, 2 as cheese_batch, 'BB' as cheese_vendor,trunc(sysdate-356) as cheese_batch_date from dual union all
select 1 as cheese_id, '201111' as cheese_year_seqno, 2 as cheese_lot, 3 as cheese_batch, 'XX' as cheese_vendor,trunc(sysdate-350) as cheese_batch_date from dual union all
select 1 as cheese_id, '201222' as cheese_year_seqno, 1 as cheese_lot, 1 as cheese_batch, 'AA' as cheese_vendor,trunc(sysdate-856) as cheese_batch_date from dual union all
select 1 as cheese_id, '201222' as cheese_year_seqno, 1 as cheese_lot, 2 as cheese_batch, 'DD' as cheese_vendor,trunc(sysdate-830) as cheese_batch_date from dual union all
select 2 as cheese_id, '201333' as cheese_year_seqno, 2 as cheese_lot, 3 as cheese_batch, 'CC' as cheese_vendor,trunc(sysdate-300) as cheese_batch_date from dual union all
select 2 as cheese_id, '201333' as cheese_year_seqno, 1 as cheese_lot, 1 as cheese_batch, 'AA' as cheese_vendor,trunc(sysdate-301) as cheese_batch_date from dual union all
select 2 as cheese_id, '201444' as cheese_year_seqno, 1 as cheese_lot, 1 as cheese_batch, 'DD' as cheese_vendor,trunc(sysdate-290) as cheese_batch_date from dual

)
select cheese_id,
       cheese_year_seqno,
       cheese_lot,
       cheese_batch,
       cheese_vendor,
       cheese_batch_date,
       rank() over (partition by cheese_id
                        order by cheese_batch_date desc,
                                 cheese_batch desc,
                                 cheese_lot desc) as ch_rank1
    from cheese_row

/* If a cheese_year_seqno has  cheese_vendor = XX then exclude the whole
    cheese_year_seqno, but return all other batch seqno.
    Rank the remaining cheese_year_seqno rows.
    In this case the 20111 year_seqno has an XX as a cheese_vendor, 
    therefore return and rank only the two rows with 201222 year_seqno.     
*/    

期望的结果:

Return 
 ID   SEQNO    LOT  BA   VEN   DATE        RNK1
---- -------- ---- ---- ----- ----------- ------
 1    201222   1    2    DD    17-JUN-12   1
 1    201222   1    2    AA    22-MAY-12   2
 2    201444   1    1    DD    09-DEC-13   1
 2    201333   2    3    CC    29-NOV-13   2
 2    201333   1    1    AA    28-NOV-13   3

【问题讨论】:

【参考方案1】:

使用第二个分析函数来确定年份是否符合条件,然后对其进行过滤:

with cheese_row as(
      select 1 as cheese_id, '201111' as cheese_year_seqno, 2 as cheese_lot, 1 as cheese_batch, 'AA' as cheese_vendor,trunc(sysdate-356) as cheese_batch_date from dual union all
      select 1 as cheese_id, '201111' as cheese_year_seqno, 2 as cheese_lot, 2 as cheese_batch, 'BB' as cheese_vendor,trunc(sysdate-356) as cheese_batch_date from dual union all
      select 1 as cheese_id, '201111' as cheese_year_seqno, 2 as cheese_lot, 3 as cheese_batch, 'XX' as cheese_vendor,trunc(sysdate-350) as cheese_batch_date from dual union all
      select 1 as cheese_id, '201222' as cheese_year_seqno, 1 as cheese_lot, 1 as cheese_batch, 'AA' as cheese_vendor,trunc(sysdate-856) as cheese_batch_date from dual union all
      select 1 as cheese_id, '201222' as cheese_year_seqno, 1 as cheese_lot, 2 as cheese_batch, 'DD' as cheese_vendor,trunc(sysdate-830) as cheese_batch_date from dual union all
      select 2 as cheese_id, '201333' as cheese_year_seqno, 2 as cheese_lot, 3 as cheese_batch, 'CC' as cheese_vendor,trunc(sysdate-300) as cheese_batch_date from dual union all
      select 2 as cheese_id, '201333' as cheese_year_seqno, 1 as cheese_lot, 1 as cheese_batch, 'AA' as cheese_vendor,trunc(sysdate-301) as cheese_batch_date from dual union all
      select 2 as cheese_id, '201444' as cheese_year_seqno, 1 as cheese_lot, 1 as cheese_batch, 'DD' as cheese_vendor,trunc(sysdate-290) as cheese_batch_date from dual
    )
select cheese_id, cheese_year_seqno, cheese_lot, cheese_batch, cheese_vendor, cheese_batch_date,
       rank() over (partition by cheese_id
                        order by cheese_batch_date desc,
                                 cheese_batch desc,
                                 cheese_lot desc) as ch_rank1
from (select cr.*,
             sum(case when cheese_vendor = 'XXX' then 1 else 0 end) over (partition by cheese_year_seqno) as XXXFlag
      from cheese_row
     ) cr
where XXXFlag = 0;

【讨论】:

【参考方案2】:

添加 where 子句,如下所示:

where cheese_year_seqno NOT IN (
  select cheese_year_seqno from cheese_row where cheese_vendor = 'XX'
  )

【讨论】:

以上是关于如何从分析查询中排除选择分区?的主要内容,如果未能解决你的问题,请参考以下文章

pg添加分区失败

从 SQL 窗口函数中排除分区?

Hive 查询以从最新分区中选择行

Azure 流分析 - ARM 模板中的“继承分区”

如何使用基于分区键的 Azure 流分析查询数据

选择时如何从配置单元视图中丢弃分区列?