在单个表中使用XMLAGG的后台处理空间问题
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在单个表中使用XMLAGG的后台处理空间问题相关的知识,希望对你有一定的参考价值。
我需要汇总与NOTI_ID对应的所有NOTI_TEXT。一个NOTI_ID可以有多个NOTI_TEXT。我正在使用XMLAGG,但它已用完假脱机。
下面是查询:
select
NOTI_ID,
cast(XMLAGG(NOTI_TEXT order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID,
CURRENT_DATE as TABLE_LOAD_DT
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
group by
NOTI_ID,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID
已收集所有相关统计信息。源表的偏斜系数为1.5以下是EXPLAIN计划:
Explain select
NOTI_ID,
cast(XMLAGG(NOTI_TEXT order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID,
CURRENT_DATE as TABLE_LOAD_DT
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
group by
NOTI_ID,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID;
1) First, we lock
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
_1 for read on a reserved RowHash in all partitions to prevent
global deadlock.
2) Next, we lock
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
_1 for read.
3) We do an all-AMPs SUM step to aggregate from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST
_1 by way of an all-rows scan with no residual conditions, and the
grouping identifier in field 1. Aggregate Intermediate Results
are computed globally, then placed in Spool 3. The input table
will not be cached in memory, but it is eligible for synchronized
scanning. The aggregate spool file will not be cached in memory.
The size of Spool 3 is estimated with high confidence to be
13,749,188 rows (64,456,193,344 bytes). The estimated time for
this step is 15 hours and 20 minutes.
4) We do an all-AMPs RETRIEVE step from Spool 3 (Last Use) by way of
an all-rows scan into Spool 1 (group_amps), which is built locally
on the AMPs. The result spool file will not be cached in memory.
The size of Spool 1 is estimated with high confidence to be
13,749,188 rows (148,092,503,948 bytes). The estimated time for
this step is 4 minutes and 10 seconds.
5) Finally, we send out an END TRANSACTION step to all AMPs involved
in processing the request.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 15 hours and 24 minutes.
已将此表用于其他查询,但我们从未发现任何异常。我想检查它是否可以进一步优化或实现相同目的的任何替代方法。
答案
这是一个大的基表和一个大的聚合,您的系统可能没有那么大(不缓存线轴)。
[尝试单独汇总,即仅基于关键列(可能是NOTI_ID
)然后再加入,这将从假脱机中删除那些额外的“分组依据”列(NOTI_MAT_DIVISION_TEXT
如果是较大的VarChar,可能会导致此问题):
select
t1.NOTI_ID,
t1.NOTI_TEXT,
t2.NOTI_COUNTRY_ID,
t2.NOTI_MAT_DIVISION_ID,
t2.NOTI_MAT_DIVISION_TEXT,
t2.NOTI_SOURCESYSTEM_ID,
CURRENT_DATE as TABLE_LOAD_DT
from
( select
NOTI_ID,
cast(XMLAGG(NOTI_TEXT order by NOTI_TEXT_LINE_ID) as varchar(32000)) as NOTI_TEXT
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
group by
NOTI_ID,
) as t1
join
( select distinct
NOTI_ID,
NOTI_COUNTRY_ID,
NOTI_MAT_DIVISION_ID,
NOTI_MAT_DIVISION_TEXT,
NOTI_SOURCESYSTEM_ID
from
HC_PRD_D_RDDL_SDTB_0_1_0_0_0_0_0_0.SDTB_DM_SEV_111_NOTI_TXT_LINES_TEST_1
) as t2
on t1.NOTI_ID = t2.NOTI_ID
以上是关于在单个表中使用XMLAGG的后台处理空间问题的主要内容,如果未能解决你的问题,请参考以下文章