Oracle merge into 基于大数据量的优化方向

Posted 2022-12-05 大树的困惑

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Oracle merge into 基于大数据量的优化方向相关的知识，希望对你有一定的参考价值。

Oracle merge into 基于大数据量的优化方向

针对Oracle数据库的更新操作我们会使用

update tableName set col1=col2 where condition...

这样子的事务提交效率比较低下，所以大部分情况下需要批量更新的话我们会使用merge into 进行替换

merge into target_tableName t
using (
	select id , col1, col2 ... from source_tableName
) a
on (t.id= a.id)
when matched then 
update set
t.col1=a.col1 ,
t.col2=a.col2 ,
...
when not matched then 
 insert
        (t.col1,
         t.col2,
         t.col3,
         ...
         )
      values
        (a.col1,
         a.col2,
         a.col3,
         ...);

常规的merge into 的应用如上，对于日常的数据量是基本符合要求的

这里要讨论面对大数据量时的优化方向

1.存储优化

merge into 的执行过程会load 两部分数据，一部分是需要更新的目标表，还有用于比对数据的using中的临时表，而using 中载入的表数据集需要等到merge into 执行完之后才会释放，而如果你using中包含了多张表的关联逻辑

如：

merge into target_TableName
using(
	select a.col1 from tablea a 
    	left join tableb b on a.id=b.id
   		left join tablec c on a.id=c.id
)....

如上逻辑，会导致using过程需要加载tablea,tableb,tablec 三张表的数据到一块临时表空间中，这样会导致内存的压力上升（针对大数据量下，小数据量可以忽略）

针对这种情况可以使用oracle提供的临时表进行优化

如：

drop table TMP_table_a;
    create global temporary table TMP_table_a (
    	col1 varchar2(64),
        col2 varchar2(64),
        col3 varchar2(64),
        col4 varchar2(64),
        col5 varchar2(64)
)
on commit delete rows;
--此临时表的功效是在事务进行提交之后，会删除掉表中所有的行

借助临时表的功能，可以提前将关联逻辑后的结果插入到临时表，merge into 之取临时表就可以了

insert into TMP_table_a values(col1 , col2 ...)
select col1,col2 .. from tablea a left join tableb b on a.id=b.id
....conditions

--在进行merge into
merge into target_TableName t
using(
	select col1... from TMP_table_a ..
) a
on (a.col1=t.col1)
when matched then 
update set 
t.col1 = a.col1 
when not matched then 
 insert
        (t.col1,
         t.col2,
         t.col3,
         ...
         )
      values
        (a.col1,
         a.col2,
         a.col3,
         ...);

2.减少对比数据量

减少对比数据量对于业务需求的理解能力要求相对高点，需要找到能够过过滤的条件，

比如对于每天增量的表就可以通过控制日期

（结合上面的存储优化，可以选择在插入临时表的时候就进行优化）

insert into TMP_table_a values(col1 , col2 ...)
select col1,col2 .. from tablea a left join tableb b on a.id=b.id
....where lastupdatedate>=end_date and conditions..
--lastupdatedate>=end_date

3.优化执行顺序

正常的使用merge into 的话，通常会使用

when matched then ...
when not matched then ...
--用于区分采取insert into 操作还是 update 操作

这样做有个比较大的弊端，merge into 它是通过对比数据来选择操作的，会从目标表中每一条数据和你的逻辑结果集（using 中的结果）中的数据进行比对，如果你的结果集过大，那么就会在

on (…)

这块数据比对逻辑这里耗费比较多的时间，而update set 和 insert into 是分开的，也就是说它会比对两次数据，所以在实际中，为了提高性能支持，会将insert into 从merge into 中抽离出来，merge into 只做更新操作

--merge into 只做更新
merge into target_tableName 
using(select col1... from TMP_table_a ..
) a
on (a.col1=t.col1)
when matched then 
update set 
t.col1 = a.col1  ;

--后续插入操作
insert into target_tableName (col1..) 
select col1 from TMP_table_a 
where conditions ...

补充：

在将insert into 从merge into 中抽离出来后，还涉及到一个执行顺序的问题

也就是表的初始化，一个新表进行初始化的时候，表是空的，所以优先执行merge into 是非常快的，因为里面没有任何数据需要更新，然后就直接走插入数据的流程

如果先执行insert into 的话，导致表内存在数据了，进行merge into 进行update的话，有需要比对数据

以上是对Oracle Merge into 操作进行优化的一些方向和建议

以上是关于Oracle merge into 基于大数据量的优化方向的主要内容，如果未能解决你的问题，请参考以下文章

Oracle中Merge into用法总结

Oracle中Merge into用法总结（转载）

oracle merge into 表里面的一部分数据怎样做？

Oracle 使用MERGE INTO 语句更新数据

Oracle merge into 的效率问题

Oracle 使用MERGE INTO 语句更新数据