我正在尝试提高查找两个表之间差异的 Oracle SQL 的性能
Posted
技术标签:
【中文标题】我正在尝试提高查找两个表之间差异的 Oracle SQL 的性能【英文标题】:I am trying to improve the performance of an Oracle SQL that is finding the differences between two tables 【发布时间】:2016-01-13 21:38:18 【问题描述】:我有两个 Oracle 表,我正在它们之间进行 UNION 以找出存储在这两个表中的数据的差异,但是当我在 SQL Developer 中运行查询时,查询太慢了,我使用的是相同的Informatica 中的查询,其吞吐量也较小。
表 1:W_SALES_INVOICE_LINE_FS EBS(NET_AMT, INVOICED_QTY, CREATED_ON_DT, CHANGED_ON_DT, INTEGRATION_ID, 'EBS' 作为 SOURCE_NAME)
表 2:W_SALES_INVOICE_LINE_F DWH (NET_AMT, INVOICED_QTY, CREATED_ON_DT, CHANGED_ON_DT, INTEGRATION_ID, 'EBS' 作为 SOURCE_NAME)
我附上带有问题的查询:
SELECT EBS.NET_AMT,
nvl(EBS.INVOICED_QTY,
case nvl(EBS.NET_AMT,0) when 0 then EBS.INVOICED_QTY
else -1 end) INVOICED_QTY,
EBS.CREATED_ON_DT,
EBS.CHANGED_ON_DT,
EBS.INTEGRATION_ID,
'EBS' AS SOURCE_NAME
FROM
W_SALES_INVOICE_LINE_FS EBS
WHERE NOT EXISTS (SELECT INTEGRATION_ID FROM W_SALES_INVOICE_LINE_F DWH
WHERE EBS.INTEGRATION_ID = DWH.INTEGRATION_ID)
UNION
SELECT DWH.NET_AMT,
DWH.INVOICED_QTY,
DWH.CREATED_ON_DT,
DWH.CHANGED_ON_DT,
DWH.INTEGRATION_ID,
'DWH' AS SOURCE_NAME
FROM
W_SALES_INVOICE_LINE_F DWH
where DWH.IS_POS = 'N' and
not exists (SELECT INTEGRATION_ID FROM W_SALES_INVOICE_LINE_FS EBS
WHERE EBS.INTEGRATION_ID = DWH.INTEGRATION_ID);
如果您想查看解释计划,请告诉我。有人可以告诉我如何提高性能,或者让我知道问题是否与其他问题有关,而不是与上述查询有关!
【问题讨论】:
Union
和 Not Exists
可能会成为性能杀手。您确定这里需要Union
而不能使用Union All
代替吗?
考虑使用 UNION ALL
而不是 UNION
以避免不必要的排序 -> 查询的两个部分的结果总是不同的,因为最后一列 SOURCE_NAME
是“EBS”或“DWH” ,但数据库不知道这一点,必须对两个结果进行排序才能执行联合。
【参考方案1】:
Not exists 和 not in statements 通常是性能瓶颈。解决这个问题的一个性能技巧是使用 LEFT OUTER JOIN 和一个声明第二个表列为空的子句,即没有匹配的行。所以试试:
SELECT EBS.NET_AMT,
nvl(EBS.INVOICED_QTY,
case nvl(EBS.NET_AMT,0) when 0 then EBS.INVOICED_QTY
else -1 end) INVOICED_QTY,
EBS.CREATED_ON_DT,
EBS.CHANGED_ON_DT,
EBS.INTEGRATION_ID,
'EBS' AS SOURCE_NAME
FROM
W_SALES_INVOICE_LINE_FS EBS
LEFT OUTER JOIN
W_SALES_INVOICE_LINE_F DWH
ON EBS.INTEGRATION_ID = DWH.INTEGRATION_ID
WHERE DWH.INTEGRATION_ID IS NULL
UNION
SELECT DWH.NET_AMT,
DWH.INVOICED_QTY,
DWH.CREATED_ON_DT,
DWH.CHANGED_ON_DT,
DWH.INTEGRATION_ID,
'DWH' AS SOURCE_NAME
FROM
W_SALES_INVOICE_LINE_F DWH
LEFT OUTER JOIN W_SALES_INVOICE_LINE_FS EBS
ON EBS.INTEGRATION_ID = DWH.INTEGRATION_ID
where EBS.INTEGRATION_ID IS NULL
AND DWH.IS_POS = 'N'
【讨论】:
【参考方案2】:您不是在执行JOIN
,而是在执行UNION
。但是,您正在执行子查询,这些可能会降低整体性能。您可以将EXISTS
更改为IN
,这样可以利用索引(如果存在)。
尝试以下方法:
SELECT EBS.NET_AMT,
nvl(EBS.INVOICED_QTY,
case nvl(EBS.NET_AMT,0) when 0 then EBS.INVOICED_QTY
else -1 end) INVOICED_QTY,
EBS.CREATED_ON_DT,
EBS.CHANGED_ON_DT,
EBS.INTEGRATION_ID,
'EBS' AS SOURCE_NAME
FROM
W_SALES_INVOICE_LINE_FS EBS
WHERE EBS.INTEGRATION_ID NOT IN (
SELECT INTEGRATION_ID
FROM W_SALES_INVOICE_LINE_F
)
UNION ALL
SELECT DWH.NET_AMT,
DWH.INVOICED_QTY,
DWH.CREATED_ON_DT,
DWH.CHANGED_ON_DT,
DWH.INTEGRATION_ID,
'DWH' AS SOURCE_NAME
FROM
W_SALES_INVOICE_LINE_F DWH
where DWH.IS_POS = 'N'
and DWH.INTEGRATION_ID not in (
SELECT INTEGRATION_ID
FROM W_SALES_INVOICE_LINE_FS
);
另外,正如 cmets 中其他人所提到的,UNION ALL
可能更合适。
此外,您可以尝试使用LEFT OUTER JOIN
,如果您有索引,则可以更明确地执行上述操作。我无法从当前位置访问我的预言机来尝试解释计划,但实际上上面和下面的优化可能类似。
SELECT EBS.NET_AMT,
Nvl(EBS.INVOICED_QTY,
CASE Nvl(EBS.NET_AMT, 0) WHEN 0
THEN EBS.INVOICED_QTY
ELSE -1 END
) AS INVOICED_QTY,
EBS.CREATED_ON_DT,
EBS.CHANGED_ON_DT,
EBS.INTEGRATION_ID,
'EBS' AS SOURCE_NAME
FROM W_SALES_INVOICE_LINE_FS EBS
LEFT OUTER JOIN W_SALES_INVOICE_LINE_F DWH
ON DWH.INTEGRATION_ID = EBS.INTEGRATION_ID
WHERE DWH.INTEGRATION_ID IS NULL
UNION ALL
SELECT DWH.NET_AMT,
DWH.INVOICED_QTY,
DWH.CREATED_ON_DT,
DWH.CHANGED_ON_DT,
DWH.INTEGRATION_ID,
'DWH' AS SOURCE_NAME
FROM W_SALES_INVOICE_LINE_F DWH
LEFT OUTER JOIN W_SALES_INVOICE_LINE_FS EBS
ON EBS.INTEGRATION_ID = DWH.INTEGRATION_ID
WHERE EBS.INTEGRATION_ID IS NULL
AND DWH.IS_POS = 'N'
;
您能否简要说明您问题中的表格?每个表中有多少(大约)记录?你有索引吗?是否有任何字段计算/派生?当您对这些或您的原始查询执行解释计划时,它在哪里显示瓶颈?
【讨论】:
我已根据您的更正更改了问题。我正在我的数据库上测试查询,它会立即返回默认的行数,即:50。我正在尝试获取所有行的计数,但它仍在运行并花费大量时间,这意味着它的性能仍然很差?如果我错了,请纠正我。 您的表格上的INTEGRATION_ID
列是否有索引?
是的,我在两个 INTEGRATION_ID 列上都创建了一个 BITMAP INDEX。
如果 INTEGRATION_ID
是唯一的,我会使用 CREATE UNIQUE INDEX ...
或者你有一个非常高的基数(并且唯一性与记录计数的比率比基数更重要),那么我会使用CREATE INDEX ...
而不是CREATE BITMAP INDEX ...
。
我的意思是BITMAP INDEX
可能不适合您的情况,INTEGRATION_ID
是一个独特的列吗?表中不同的INTEGRATION_ID
值与记录总数的比率是多少?【参考方案3】:
您正在手动编写一个完整的外部联接,Oracle 可以自动为此类比较任务执行此操作(我猜它可能运行得更快)
select
ebs.net_amt ebs_net_amt,
dwh.net_amt dwh_net_amt,
nvl(ebs.invoiced_qty,case nvl(ebs.net_amt,0) when 0 then ebs.invoiced_qty else -1 end) invoiced_qty_ebs,
dwh.invoiced_qty invoiced_qty_dwh,
ebs.created_on_dt ebs_created_on_dt,
dwh.created_on_dt dwh_created_on_dt,
ebs.changed_on_dt ebs_changed_on_dt,
dwh.changed_on_dt dwh_changed_on_dt,
nvl(ebs.integration_id,ebs.integration_id) integration_id,
case
when ebs.integration_id is not null and ebs.integration_id is not null and then 'EBS and DWH'
when dwh.integration_id is not null then 'EBS'
else 'DWH'
end source_name
from
w_sales_invoice_line_fs ebs
full outer join
(select * from w_sales_invoice_line_f dwh where dwh.is_pos = 'N') dwh
on
(ebs.integration_id = dwh.integration_id)
where
ebs.integration_id is null or dwh.integration_id is null --restrict to records missing on one side
【讨论】:
以上是关于我正在尝试提高查找两个表之间差异的 Oracle SQL 的性能的主要内容,如果未能解决你的问题,请参考以下文章