Oracle开发者性能课第7课(Join如何工作)实验

Posted dingdingfish

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle开发者性能课第7课(Join如何工作)实验相关的知识,希望对你有一定的参考价值。

概述

本实验参考DevGym中的实验指南

创建环境

创建表card_deck ,就一张表:

create table card_deck (
  pk      integer,
  val     varchar2(10),
  suit    varchar2(10),
  damaged varchar2(1),
  notes   varchar2(200)
) pctfree 75;

insert into card_deck ( pk, val, suit, damaged, notes )
  select level,
         case mod(rownum, 13)+1
           when 1 then 'Ace'
           when 11 then 'Jack'
           when 12 then 'Queen'
           when 13 then 'King'
           else to_char(mod(rownum, 13)+1)
         end case, 
         case ceil(rownum/13)
           when 1 then 'spades'
           when 2 then 'clubs'
           when 3 then 'hearts'
           when 4 then 'diamonds'
         end case, 
         case
           when mod ( rownum, 10 ) = 1 then 'Y'
           else 'N'
         end damaged, 
         case
           when rownum = 1 then 'SQL is awesome!'
           else dbms_random.string ( 'a', 200 )
         end notes
  from   dual
  connect by level <= 52
  order  by dbms_random.value;

commit;

-- 结果为6,表示损坏牌的数量
select count(*) from card_deck
where  damaged = 'Y';

exec dbms_stats.gather_table_stats ( null, 'card_deck', options => 'gather auto' ) ;

查看数据,总共52行,就是1副牌4中花色,但没有大小王:

SELECT
    pk,
    val,
    suit,
    damaged
FROM
    card_deck
ORDER BY
    suit,
    pk;
    
   PK      VAL        SUIT    DAMAGED
_____ ________ ___________ __________
   14 2        clubs       N
   15 3        clubs       N
   16 4        clubs       N
   17 5        clubs       N
   18 6        clubs       N
   19 7        clubs       N
   20 8        clubs       N
   21 9        clubs       Y
   22 10       clubs       N
   23 Jack     clubs       N
   24 Queen    clubs       N
   25 King     clubs       N
   26 Ace      clubs       N
   40 2        diamonds    N
   41 3        diamonds    Y
   42 4        diamonds    N
   43 5        diamonds    N
   44 6        diamonds    N
   45 7        diamonds    N
   46 8        diamonds    N
   47 9        diamonds    N
   48 10       diamonds    N
   49 Jack     diamonds    N
   50 Queen    diamonds    N
   51 King     diamonds    Y
   52 Ace      diamonds    N
   27 2        hearts      N
   28 3        hearts      N
   29 4        hearts      N
   30 5        hearts      N
   31 6        hearts      Y
   32 7        hearts      N
   33 8        hearts      N
   34 9        hearts      N
   35 10       hearts      N
   36 Jack     hearts      N
   37 Queen    hearts      N
   38 King     hearts      N
   39 Ace      hearts      N
    1 2        spades      Y
    2 3        spades      N
    3 4        spades      N
    4 5        spades      N
    5 6        spades      N
    6 7        spades      N
    7 8        spades      N
    8 9        spades      N
    9 10       spades      N
   10 Jack     spades      N
   11 Queen    spades      Y
   12 King     spades      N
   13 Ace      spades      N

52 rows selected.

注意几个术语,deck指一副,suit指花色,有hearts, clubs, diamonds and spades,即红桃,梅花,方片和黑桃。

简介

我们已经知道有3种Join方法:

  • Hash Joins
  • (Sort) Merge Joins
  • Nested Loops

Hash Join(哈希联结)

进行哈希连接的步骤是:

  1. 返回较小数据集中的所有行
  2. 使用这些行的联结列构建哈希表
  3. 读取第二个表(通常是大表)中的行
  4. 通过对第二个表的联结列应用相同的哈希函数来探测在步骤 2 中构建的哈希表
  5. 如果在哈希表中找到匹配的条目,则数据库将联结的行传递给执行计划的下一步

通常,小的数据集也称为outer table,dimension table,build端;第二个表则称为inner table,fact table和probe端。

看一个示例,注意执行计划中的HASH JOIN关键字:

select /*+ gather_plan_statistics */count(*)
from   card_deck d1
join   card_deck d2
on     d1.suit = d2.suit
and    d1.val = d2.val;

   COUNT(*)
___________
         52

select * from   table(dbms_xplan.display_cursor(format => 'iosTATS LAST'));

                                                                             PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID  cbf4m3nuqx3yb, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from   card_deck d1 join
card_deck d2 on     d1.suit = d2.suit and    d1.val = d2.val

Plan hash value: 656221691

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |      1 |        |      1 |00:00:00.01 |      30 |
|   1 |  SORT AGGREGATE     |           |      1 |      1 |      1 |00:00:00.01 |      30 |
|*  2 |   HASH JOIN         |           |      1 |     52 |     52 |00:00:00.01 |      30 |
|   3 |    TABLE ACCESS FULL| CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
|   4 |    TABLE ACCESS FULL| CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("D1"."SUIT"="D2"."SUIT" AND "D1"."VAL"="D2"."VAL")


22 rows selected.

只有当联结列为等值查询(例如d1.suit = d2.suit)时才会考虑哈希联结,在此前提下,优化器在以下情形倾向于选择哈希联结:

  • 大数据集 或
  • 小表中的大多数行 或
  • 两个表的联结列上都没有索引

合并联结

也称为排序合并联结,其算法是:

  1. 对第一个数据集中的行进行排序
  2. 对第二个数据集中的行进行排序
  3. 对于第一个数据集中的每一行,在第二个数据集中找到一个起始行
  4. 读取第二个数据集,直到在第一个数据集中找到值大于当前行的行(即截止行)
  5. 读取第一个数据集中的下一行并重复整个过程

例如,注意执行计划中的 MERGE JOIN 关键字:

select /*+ gather_plan_statistics */count(*)
from   card_deck d1
join   card_deck d2
on     d1.suit < d2.suit
and    d1.val < d2.val;

   COUNT(*)
___________
        468

select * from   table(dbms_xplan.display_cursor(format => 'IOSTATS LAST'));

                                                                              PLAN_TABLE_OUTPUT
________________________________________________________________________________________________
SQL_ID  5b61gmfg4zb28, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from   card_deck d1 join
card_deck d2 on     d1.suit < d2.suit and    d1.val < d2.val

Plan hash value: 4075068009

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |      1 |00:00:00.01 |      30 |
|   1 |  SORT AGGREGATE       |           |      1 |      1 |      1 |00:00:00.01 |      30 |
|   2 |   MERGE JOIN          |           |      1 |    468 |    468 |00:00:00.01 |      30 |
|   3 |    SORT JOIN          |           |      1 |     52 |     52 |00:00:00.01 |      15 |
|   4 |     TABLE ACCESS FULL | CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
|*  5 |    FILTER             |           |     52 |        |    468 |00:00:00.01 |      15 |
|*  6 |     SORT JOIN         |           |     52 |     52 |   1014 |00:00:00.01 |      15 |
|   7 |      TABLE ACCESS FULL| CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - filter("D1"."VAL"<"D2"."VAL")
   6 - access("D1"."SUIT"<"D2"."SUIT")
       filter("D1"."SUIT"<"D2"."SUIT")


27 rows selected.

请注意,数据库读取第二个表一次(第 7 行 Starts = 1)。但是对第一个表中的每一行运行一次 BUFFER SORT(第 6 行的Starts = 52)。

优化器可以在以下情况下选择合并连接:

  • 联结使用范围比较(<、>=、between等)
  • 等值联接 (=) 并且其中一个数据集已经排序,从而使数据库能够避免排序

排序很慢,因此优化器很少使用合并连接。

索引是有序的数据结构。因此,如果一个表的联结列上一有索引,优化器可以使用它来避免排序。Oracle 数据库将始终对第二个数据集进行排序,即使在它的联结列上已有索引。

嵌套循环联结

其过程为:

  1. 从第一个(outer)数据集中读取行
  2. 对于此中的每一行,查询第二个(inner)数据集以获取匹配行
  3. 重复直到从外部表读取所有行

例如,注意执行计划中的NESTED LOOPS关键字:

select /*+ gather_plan_statistics */count(*)
from   card_deck d1
join   card_deck d2
on     d1.suit <> d2.suit
and    d1.val <> d2.val;

   COUNT(*)
___________
       1872

select * from   table(dbms_xplan.display_cursor(format => 'IOSTATS LAST'));

                                                                             PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID  021zt1s6h3gvw, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from   card_deck d1 join
card_deck d2 on     d1.suit <> d2.suit and    d1.val <> d2.val

Plan hash value: 1266070232

-------------------------------------------------------------------------------------------
| Id  | Operation           | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |           |      1 |        |      1 |00:00:00.01 |     795 |
|   1 |  SORT AGGREGATE     |           |      1 |      1 |      1 |00:00:00.01 |     795 |
|   2 |   NESTED LOOPS      |           |      1 |   1872 |   1872 |00:00:00.01 |     795 |
|   3 |    TABLE ACCESS FULL| CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
|*  4 |    TABLE ACCESS FULL| CARD_DECK |     52 |     36 |   1872 |00:00:00.01 |     780 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("D1"."SUIT"<>"D2"."SUIT" AND "D1"."VAL"<>"D2"."VAL"))


22 rows selected.

注意每次读取第1张表的一行时,第2张表都需要做52次全表扫描。

优化器可能在以下情况下选择嵌套循环联结:

  • 与小数据集联结
  • 获取大数据集的一小部分行
  • 内表的查询是高效的(例如它的联结列上有索引)
  • 所有的联结条件都是不等式(!=)

笛卡尔(Cartesian)合并联结

通常发生在Cross Join(笛卡尔联结)时,例如,注意执行计划中的MERGE JOIN CARTESIAN关键字:

select /*+ gather_plan_statistics */count(*)
from   card_deck d1
cross join card_deck d2;

   COUNT(*)
___________
       2704
-- 2704是52的平方

select * from   table(dbms_xplan.display_cursor (format => 'IOSTATS LAST'));

                                                                               PLAN_TABLE_OUTPUT
________________________________________________________________________________________________
SQL_ID  8jg5afrgunsx9, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from   card_deck d1 cross
join card_deck d2

Plan hash value: 1809085648

---------------------------------------------------------------------------------------------
| Id  | Operation             | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |      1 |        |      1 |00:00:00.01 |      30 |
|   1 |  SORT AGGREGATE       |           |      1 |      1 |      1 |00:00:00.01 |      30 |
|   2 |   MERGE JOIN CARTESIAN|           |      1 |   2704 |   2704 |00:00:00.01 |      30 |
|   3 |    TABLE ACCESS FULL  | CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
|   4 |    BUFFER SORT        |           |     52 |     52 |   2704 |00:00:00.01 |      15 |
|   5 |     TABLE ACCESS FULL | CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
---------------------------------------------------------------------------------------------


18 rows selected.

这种联结比较少见,而且通常表示有问题。

Top-N 查询

例如。优化器选择了哈希联结,因为在联结列上没有索引:

select /*+ gather_plan_statistics */ d1.pk, d1.val, d1.suit, d1.damaged
from   card_deck d1
join   card_deck d2
on     d1.suit = d2.suit
and    d1.val = d2.val
order  by d1.val
fetch  first 5 rows only;

   PK    VAL        SUIT    DAMAGED
_____ ______ ___________ __________
    9 10     spades      N
   35 10     hearts      N
   22 10     clubs       N
   48 10     diamonds    N
   27 2      hearts      N

select * from   table(dbms_xplan.display_cursor (format => 'IOSTATS LAST'));

                                                                                  PLAN_TABLE_OUTPUT
___________________________________________________________________________________________________
SQL_ID  bd52xdx1fbpr5, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ d1.pk, d1.val, d1.suit, d1.damaged
from   card_deck d1 join   card_deck d2 on     d1.suit = d2.suit and
d1.val = d2.val order  by d1.val fetch  first 5 rows only

Plan hash value: 905561733

------------------------------------------------------------------------------------------------
| Id  | Operation                | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT         |           |      1 |        |      5 |00:00:00.01 |      30 |
|*  1 |  VIEW                    |           |      1 |      5 |      5 |00:00:00.01 |      30 |
|*  2 |   WINDOW SORT PUSHED RANK|           |      1 |     52 |      5 |00:00:00.01 |      30 |
|*  3 |    HASH JOIN             |           |      1 |     52 |     52 |00:00:00.01 |      30 |
|   4 |     TABLE ACCESS FULL    | CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
|   5 |     TABLE ACCESS FULL    | CARD_DECK |      1 |     52 |     52 |00:00:00.01 |      15 |
------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("from$_subquery$_004"."rowlimit_$$_rownumber"<=5)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "D1"."VAL")<=5)
   3 - access("D1"."SUIT"="D2"."SUIT" AND Oracle开发者性能课第6课(如何创建物化视图)实验

Oracle开发者性能课第4课(如何创建索引)实验

Oracle开发者性能课第1课(如何阅读执行计划)实验

Oracle开发者性能课第9课(如何查找慢 SQL)实验

Oracle开发者性能课第8课(如何更快地进行插入更新和删除)实验

Oracle开发者性能课第5课(为什么我的查询不使用索引)实验