在单个表中使用XMLAGG的后台处理空间问题

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在单个表中使用XMLAGG的后台处理空间问题相关的知识,希望对你有一定的参考价值。

我需要汇总与NOTI_ID对应的所有NOTI_TEXT。一个NOTI_ID可以有多个NOTI_TEXT。我正在使用XMLAGG,但它已用完假脱机。

下面是查询:

select
        NOTI_ID,
        cast(XMLAGG(NOTI_TEXT  order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT,
        NOTI_COUNTRY_ID,
        NOTI_MAT_DIVISION_ID,
        NOTI_MAT_DIVISION_TEXT,
        NOTI_SOURCESYSTEM_ID,
        CURRENT_DATE as TABLE_LOAD_DT
     from
        HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
     group by
        NOTI_ID,
        NOTI_COUNTRY_ID,
        NOTI_MAT_DIVISION_ID,
        NOTI_MAT_DIVISION_TEXT,
        NOTI_SOURCESYSTEM_ID 

已收集所有相关统计信息。源表的偏斜系数为1.5以下是EXPLAIN计划:

Explain select
         NOTI_ID,
         cast(XMLAGG(NOTI_TEXT  order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT,
         NOTI_COUNTRY_ID,
         NOTI_MAT_DIVISION_ID,
         NOTI_MAT_DIVISION_TEXT,
         NOTI_SOURCESYSTEM_ID,
         CURRENT_DATE as TABLE_LOAD_DT
      from
         HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
      group by
         NOTI_ID,
         NOTI_COUNTRY_ID,
         NOTI_MAT_DIVISION_ID,
         NOTI_MAT_DIVISION_TEXT,
         NOTI_SOURCESYSTEM_ID; 


  1) First, we lock
     HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
     _1 for read on a reserved RowHash in all partitions to prevent
     global deadlock.
  2) Next, we lock
     HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
     _1 for read.
  3) We do an all-AMPs SUM step to aggregate from
     HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
     _1 by way of an all-rows scan with no residual conditions, and the
     grouping identifier in field 1.  Aggregate Intermediate Results
     are computed globally, then placed in Spool 3.  The input table
     will not be cached in memory, but it is eligible for synchronized
     scanning.  The aggregate spool file will not be cached in memory.
     The size of Spool 3 is estimated with high confidence to be
     13,749,188 rows (64,456,193,344 bytes).  The estimated time for
     this step is 15 hours and 20 minutes.
  4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
     an all-rows scan into Spool 1 (group_amps), which is built locally
     on the AMPs.  The result spool file will not be cached in memory.
     The size of Spool 1 is estimated with high confidence to be
     13,749,188 rows (148,092,503,948 bytes).  The estimated time for
     this step is 4 minutes and 10 seconds.
  5) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 15 hours and 24 minutes.

已将此表用于其他查询,但我们从未发现任何异常。我想检查它是否可以进一步优化或实现相同目的的任何替代方法。

答案

这是一个大的基表和一个大的聚合,您的系统可能没有那么大(不缓存线轴)。

[尝试单独汇总,即仅基于关键列(可能是NOTI_ID)然后再加入,这将从假脱机中删除那些额外的“分组依据”列(NOTI_MAT_DIVISION_TEXT如果是较大的VarChar,可能会导致此问题):

select
   t1.NOTI_ID,
   t1.NOTI_TEXT,
   t2.NOTI_COUNTRY_ID,
   t2.NOTI_MAT_DIVISION_ID,
   t2.NOTI_MAT_DIVISION_TEXT,
   t2.NOTI_SOURCESYSTEM_ID,
   CURRENT_DATE as TABLE_LOAD_DT
from
 (   select
        NOTI_ID,
        cast(XMLAGG(NOTI_TEXT  order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT
     from
        HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
     group by
        NOTI_ID,
 ) as t1 
join 
 (   select distinct
        NOTI_ID,
        NOTI_COUNTRY_ID,
        NOTI_MAT_DIVISION_ID,
        NOTI_MAT_DIVISION_TEXT,
        NOTI_SOURCESYSTEM_ID
     from
        HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
 ) as t2 
on t1.NOTI_ID = t2.NOTI_ID

以上是关于在单个表中使用XMLAGG的后台处理空间问题的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 Oracle 11g 中的 XMLAGG 函数序列化树数据而不是重复的 XML 标记,嵌套结构?

ORACLE,XMLAGG(arg),空 arg

Oracle 多个 XMLAgg

如何使用单个事件中心名称空间处理多个数据源

将“LISTAGG”转换为“XMLAGG”

使用 xmlagg 函数的 SQL 多 SELECT 查询 - 未以所需方式提取数据