Oracle开发者性能课第7课(Join如何工作)实验
Posted dingdingfish
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle开发者性能课第7课(Join如何工作)实验相关的知识,希望对你有一定的参考价值。
概述
本实验参考DevGym中的实验指南。
创建环境
创建表card_deck ,就一张表:
create table card_deck (
pk integer,
val varchar2(10),
suit varchar2(10),
damaged varchar2(1),
notes varchar2(200)
) pctfree 75;
insert into card_deck ( pk, val, suit, damaged, notes )
select level,
case mod(rownum, 13)+1
when 1 then 'Ace'
when 11 then 'Jack'
when 12 then 'Queen'
when 13 then 'King'
else to_char(mod(rownum, 13)+1)
end case,
case ceil(rownum/13)
when 1 then 'spades'
when 2 then 'clubs'
when 3 then 'hearts'
when 4 then 'diamonds'
end case,
case
when mod ( rownum, 10 ) = 1 then 'Y'
else 'N'
end damaged,
case
when rownum = 1 then 'SQL is awesome!'
else dbms_random.string ( 'a', 200 )
end notes
from dual
connect by level <= 52
order by dbms_random.value;
commit;
-- 结果为6,表示损坏牌的数量
select count(*) from card_deck
where damaged = 'Y';
exec dbms_stats.gather_table_stats ( null, 'card_deck', options => 'gather auto' ) ;
查看数据,总共52行,就是1副牌4中花色,但没有大小王:
SELECT
pk,
val,
suit,
damaged
FROM
card_deck
ORDER BY
suit,
pk;
PK VAL SUIT DAMAGED
_____ ________ ___________ __________
14 2 clubs N
15 3 clubs N
16 4 clubs N
17 5 clubs N
18 6 clubs N
19 7 clubs N
20 8 clubs N
21 9 clubs Y
22 10 clubs N
23 Jack clubs N
24 Queen clubs N
25 King clubs N
26 Ace clubs N
40 2 diamonds N
41 3 diamonds Y
42 4 diamonds N
43 5 diamonds N
44 6 diamonds N
45 7 diamonds N
46 8 diamonds N
47 9 diamonds N
48 10 diamonds N
49 Jack diamonds N
50 Queen diamonds N
51 King diamonds Y
52 Ace diamonds N
27 2 hearts N
28 3 hearts N
29 4 hearts N
30 5 hearts N
31 6 hearts Y
32 7 hearts N
33 8 hearts N
34 9 hearts N
35 10 hearts N
36 Jack hearts N
37 Queen hearts N
38 King hearts N
39 Ace hearts N
1 2 spades Y
2 3 spades N
3 4 spades N
4 5 spades N
5 6 spades N
6 7 spades N
7 8 spades N
8 9 spades N
9 10 spades N
10 Jack spades N
11 Queen spades Y
12 King spades N
13 Ace spades N
52 rows selected.
注意几个术语,deck指一副,suit指花色,有hearts, clubs, diamonds and spades,即红桃,梅花,方片和黑桃。
简介
我们已经知道有3种Join方法:
- Hash Joins
- (Sort) Merge Joins
- Nested Loops
Hash Join(哈希联结)
进行哈希连接的步骤是:
- 返回较小数据集中的所有行
- 使用这些行的联结列构建哈希表
- 读取第二个表(通常是大表)中的行
- 通过对第二个表的联结列应用相同的哈希函数来探测在步骤 2 中构建的哈希表
- 如果在哈希表中找到匹配的条目,则数据库将联结的行传递给执行计划的下一步
通常,小的数据集也称为outer table,dimension table,build端;第二个表则称为inner table,fact table和probe端。
看一个示例,注意执行计划中的HASH JOIN关键字:
select /*+ gather_plan_statistics */count(*)
from card_deck d1
join card_deck d2
on d1.suit = d2.suit
and d1.val = d2.val;
COUNT(*)
___________
52
select * from table(dbms_xplan.display_cursor(format => 'iosTATS LAST'));
PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID cbf4m3nuqx3yb, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from card_deck d1 join
card_deck d2 on d1.suit = d2.suit and d1.val = d2.val
Plan hash value: 656221691
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
|* 2 | HASH JOIN | | 1 | 52 | 52 |00:00:00.01 | 30 |
| 3 | TABLE ACCESS FULL| CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
| 4 | TABLE ACCESS FULL| CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("D1"."SUIT"="D2"."SUIT" AND "D1"."VAL"="D2"."VAL")
22 rows selected.
只有当联结列为等值查询(例如d1.suit = d2.suit
)时才会考虑哈希联结,在此前提下,优化器在以下情形倾向于选择哈希联结:
- 大数据集 或
- 小表中的大多数行 或
- 两个表的联结列上都没有索引
合并联结
也称为排序合并联结,其算法是:
- 对第一个数据集中的行进行排序
- 对第二个数据集中的行进行排序
- 对于第一个数据集中的每一行,在第二个数据集中找到一个起始行
- 读取第二个数据集,直到在第一个数据集中找到值大于当前行的行(即截止行)
- 读取第一个数据集中的下一行并重复整个过程
例如,注意执行计划中的 MERGE JOIN 关键字:
select /*+ gather_plan_statistics */count(*)
from card_deck d1
join card_deck d2
on d1.suit < d2.suit
and d1.val < d2.val;
COUNT(*)
___________
468
select * from table(dbms_xplan.display_cursor(format => 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
________________________________________________________________________________________________
SQL_ID 5b61gmfg4zb28, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from card_deck d1 join
card_deck d2 on d1.suit < d2.suit and d1.val < d2.val
Plan hash value: 4075068009
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
| 2 | MERGE JOIN | | 1 | 468 | 468 |00:00:00.01 | 30 |
| 3 | SORT JOIN | | 1 | 52 | 52 |00:00:00.01 | 15 |
| 4 | TABLE ACCESS FULL | CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
|* 5 | FILTER | | 52 | | 468 |00:00:00.01 | 15 |
|* 6 | SORT JOIN | | 52 | 52 | 1014 |00:00:00.01 | 15 |
| 7 | TABLE ACCESS FULL| CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
---------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - filter("D1"."VAL"<"D2"."VAL")
6 - access("D1"."SUIT"<"D2"."SUIT")
filter("D1"."SUIT"<"D2"."SUIT")
27 rows selected.
请注意,数据库读取第二个表一次(第 7 行 Starts = 1)。但是对第一个表中的每一行运行一次 BUFFER SORT(第 6 行的Starts = 52)。
优化器可以在以下情况下选择合并连接:
- 联结使用范围比较(<、>=、between等)
- 等值联接 (=) 并且其中一个数据集已经排序,从而使数据库能够避免排序
排序很慢,因此优化器很少使用合并连接。
索引是有序的数据结构。因此,如果一个表的联结列上一有索引,优化器可以使用它来避免排序。Oracle 数据库将始终对第二个数据集进行排序,即使在它的联结列上已有索引。
嵌套循环联结
其过程为:
- 从第一个(outer)数据集中读取行
- 对于此中的每一行,查询第二个(inner)数据集以获取匹配行
- 重复直到从外部表读取所有行
例如,注意执行计划中的NESTED LOOPS关键字:
select /*+ gather_plan_statistics */count(*)
from card_deck d1
join card_deck d2
on d1.suit <> d2.suit
and d1.val <> d2.val;
COUNT(*)
___________
1872
select * from table(dbms_xplan.display_cursor(format => 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID 021zt1s6h3gvw, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from card_deck d1 join
card_deck d2 on d1.suit <> d2.suit and d1.val <> d2.val
Plan hash value: 1266070232
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 795 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 795 |
| 2 | NESTED LOOPS | | 1 | 1872 | 1872 |00:00:00.01 | 795 |
| 3 | TABLE ACCESS FULL| CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
|* 4 | TABLE ACCESS FULL| CARD_DECK | 52 | 36 | 1872 |00:00:00.01 | 780 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("D1"."SUIT"<>"D2"."SUIT" AND "D1"."VAL"<>"D2"."VAL"))
22 rows selected.
注意每次读取第1张表的一行时,第2张表都需要做52次全表扫描。
优化器可能在以下情况下选择嵌套循环联结:
- 与小数据集联结
- 获取大数据集的一小部分行
- 内表的查询是高效的(例如它的联结列上有索引)
- 所有的联结条件都是不等式(!=)
笛卡尔(Cartesian)合并联结
通常发生在Cross Join(笛卡尔联结)时,例如,注意执行计划中的MERGE JOIN CARTESIAN关键字:
select /*+ gather_plan_statistics */count(*)
from card_deck d1
cross join card_deck d2;
COUNT(*)
___________
2704
-- 2704是52的平方
select * from table(dbms_xplan.display_cursor (format => 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
________________________________________________________________________________________________
SQL_ID 8jg5afrgunsx9, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from card_deck d1 cross
join card_deck d2
Plan hash value: 1809085648
---------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
---------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 30 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 30 |
| 2 | MERGE JOIN CARTESIAN| | 1 | 2704 | 2704 |00:00:00.01 | 30 |
| 3 | TABLE ACCESS FULL | CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
| 4 | BUFFER SORT | | 52 | 52 | 2704 |00:00:00.01 | 15 |
| 5 | TABLE ACCESS FULL | CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
---------------------------------------------------------------------------------------------
18 rows selected.
这种联结比较少见,而且通常表示有问题。
Top-N 查询
例如。优化器选择了哈希联结,因为在联结列上没有索引:
select /*+ gather_plan_statistics */ d1.pk, d1.val, d1.suit, d1.damaged
from card_deck d1
join card_deck d2
on d1.suit = d2.suit
and d1.val = d2.val
order by d1.val
fetch first 5 rows only;
PK VAL SUIT DAMAGED
_____ ______ ___________ __________
9 10 spades N
35 10 hearts N
22 10 clubs N
48 10 diamonds N
27 2 hearts N
select * from table(dbms_xplan.display_cursor (format => 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
___________________________________________________________________________________________________
SQL_ID bd52xdx1fbpr5, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ d1.pk, d1.val, d1.suit, d1.damaged
from card_deck d1 join card_deck d2 on d1.suit = d2.suit and
d1.val = d2.val order by d1.val fetch first 5 rows only
Plan hash value: 905561733
------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 5 |00:00:00.01 | 30 |
|* 1 | VIEW | | 1 | 5 | 5 |00:00:00.01 | 30 |
|* 2 | WINDOW SORT PUSHED RANK| | 1 | 52 | 5 |00:00:00.01 | 30 |
|* 3 | HASH JOIN | | 1 | 52 | 52 |00:00:00.01 | 30 |
| 4 | TABLE ACCESS FULL | CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
| 5 | TABLE ACCESS FULL | CARD_DECK | 1 | 52 | 52 |00:00:00.01 | 15 |
------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter("from$_subquery$_004"."rowlimit_$$_rownumber"<=5)
2 - filter(ROW_NUMBER() OVER ( ORDER BY "D1"."VAL")<=5)
3 - access("D1"."SUIT"="D2"."SUIT" AND Oracle开发者性能课第6课(如何创建物化视图)实验