Oracle开发者性能课第4课（如何创建索引）实验

Posted 2021-12-03 dingdingfish

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Oracle开发者性能课第4课（如何创建索引）实验相关的知识，希望对你有一定的参考价值。

概述

本实验参考DevGym中的实验指南。

创建环境

首先创建表bricks，并在最后搜集统计信息：

create table bricks (
  brick_id integer not null,
  colour   varchar2(10) not null,
  shape    varchar2(10) not null,
  weight   integer not null,
  colour_mixed_case varchar2(10) not null,
  insert_datetime date not null,
  junk     varchar2(1000) not null
);
exec dbms_random.seed ( 0 );

insert into bricks
  with rws as (
    select level x, 
           case ceil ( level / 250 )
             when 4 then 'red'
             when 1 then 'blue'
             when 2 then 'green'
             when 3 then 'yellow'
           end colour
    from   dual
    connect by level <= 1000
  )
    select rownum,
           colour,
           case mod ( rownum, 4 )
             when 0 then 'cube'
             when 1 then 'cylinder'
             when 2 then 'pyramid'
             when 3 then 'prism'
           end shape,
           round ( dbms_random.value ( 1, 10 ) ),
           case mod ( rownum, 3 )
             when 0 then upper ( colour ) 
             when 1 then lower ( colour ) 
             when 2 then initcap ( colour ) 
           end mixed_case,
           date'2020-01-01' + ( rownum / 12 ),
           rpad ( chr ( mod ( rownum, 26 ) + 65 ), 1000, 'x' ) 
    from   rws;
    
commit;

exec dbms_stats.gather_table_stats ( null, 'bricks' ) ;

查看数据，bricks表有1000行：

SQL> select count(*) from bricks;

   COUNT(*)
___________
       1000

SQL>
SELECT
    brick_id,
    colour,
    shape,
    weight,
    colour_mixed_case,
    insert_datetime
FROM
    bricks where rownum <= 9;

   BRICK_ID    COLOUR       SHAPE    WEIGHT    COLOUR_MIXED_CASE    INSERT_DATETIME
___________ _________ ___________ _________ ____________________ __________________
         13 blue      cylinder            5 blue                 02-JAN-20
         14 blue      pyramid             8 Blue                 02-JAN-20
         15 blue      prism               8 BLUE                 02-JAN-20
         16 blue      cube                5 blue                 02-JAN-20
         17 blue      cylinder            4 Blue                 02-JAN-20
         18 blue      pyramid             3 BLUE                 02-JAN-20
         19 blue      prism               9 blue                 02-JAN-20
         20 blue      cube                2 Blue                 02-JAN-20
         21 blue      cylinder            8 BLUE                 02-JAN-20

9 rows selected.

junk列长度为1000，是随机字符串和很多’x’的拼接，没有在结果集中显示。
colour在4种颜色中循环；shape在4种形状中循环；weight是10以内的随机数。colour_mixed_case是颜色的3种大小写转换。

简介

数据库中最常见的数据访问方法为：

全表扫描
索引访问

本文将比较这两种方法。

全表扫描（Full Table Scan）

由于表目前没有索引，所以所有的SQL都基于全表扫描，标志为TABLE ACCESS FULL，以及Predicate Information部分中的filter：

SQL>
select /*+ gather_plan_statistics */count(*) from bricks where  colour = 'red';

   COUNT(*)
___________
        250

SQL>
select *  from   table(dbms_xplan.display_cursor( format => 'iosTATS LAST'));

                                                                         PLAN_TABLE_OUTPUT
__________________________________________________________________________________________
SQL_ID  d65kqrh09hm1g, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from bricks where  colour
= 'red'

Plan hash value: 1774413877

---------------------------------------------------------------------------------------
| Id  | Operation          | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |      1 |        |      1 |00:00:00.01 |     190 |
|   1 |  SORT AGGREGATE    |        |      1 |      1 |      1 |00:00:00.01 |     190 |
|*  2 |   TABLE ACCESS FULL| BRICKS |      1 |    250 |    250 |00:00:00.01 |     190 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("COLOUR"='red')


20 rows selected.

这意味着数据库读取表中的每一行和每个数据块（直到高水位线）。然后应用where子句。并返回条件为真的行。对于返回几行的搜索，检查每一行显然是一种巨大的浪费。
在查询的WHERE子句中为列创建索引使数据库只读取与搜索条件匹配的行。这会减少执行查询所需的工作量。

全表扫描的工作原理参看这里。

创建索引

数据库索引存储索引列中的值，并指向表中对应这些值的行。Oracle数据库中的标准索引是B树索引。使用此方法，数据库可以在索引中搜索与WHERE子句匹配的条目。这会导致工作量大幅度减少。

创建索引需3个要素：

索引名
建立索引的表
逗号分割的索引列

例如：

SQL> create index brick_colour_i on bricks ( colour );

Index BRICK_COLOUR_I created.

重新查询：

SQL> select /*+ gather_plan_statistics */count(*) from bricks where  colour = 'red';

   COUNT(*)
___________
        250

SQL> select *  from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                                         PLAN_TABLE_OUTPUT
__________________________________________________________________________________________________________
SQL_ID  d65kqrh09hm1g, child number 0
-------------------------------------
select /*+ gather_plan_statistics */count(*) from bricks where  colour
= 'red'

Plan hash value: 2801761771

-------------------------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |      1 |        |      1 |00:00:00.01 |       2 |      4 |
|   1 |  SORT AGGREGATE   |                |      1 |      1 |      1 |00:00:00.01 |       2 |      4 |
|*  2 |   INDEX RANGE SCAN| BRICK_COLOUR_I |      1 |    250 |    250 |00:00:00.01 |       2 |      4 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COLOUR"='red')


20 rows selected.

执行计划中的INDEX RANGE SCAN和Predicate Information中的access表明这次使用了索引，Buffers由190变为2，巨大的节省。

查看索引

SQL>
select * from user_indexes where  table_name = 'BRICKS';

SQL>
select index_name, column_name, column_position 
from   user_ind_columns
where  table_name = 'BRICKS'
order  by index_name, column_position;

       INDEX_NAME    COLUMN_NAME    COLUMN_POSITION
_________________ ______________ __________________
BRICK_COLOUR_I    COLOUR                          1

复合索引

复合索引即多列索引。

目前我们已有一个1列的索引，但在下面基于两列的查询时仍发挥了作用（Buffers由190降到46）。首先利用这个索引得到250行（access方法），剩下的使用filter方法。

select /*+ gather_plan_statistics */*
from   bricks
where  colour = 'red'
and    weight = 1;

select *  
from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                                                  PLAN_TABLE_OUTPUT
___________________________________________________________________________________________________________________
SQL_ID  5c4qmbrdjdnj9, child number 0
-------------------------------------
select /*+ gather_plan_statistics */* from   bricks where  colour =
'red' and    weight = 1

Plan hash value: 2278089145

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name           | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                |      1 |        |     14 |00:00:00.01 |      46 |
|*  1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BRICKS         |      1 |     25 |     14 |00:00:00.01 |      46 |
|*  2 |   INDEX RANGE SCAN                  | BRICK_COLOUR_I |      1 |    250 |    250 |00:00:00.01 |       3 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("WEIGHT"=1)
   2 - access("COLOUR"='red')


21 rows selected.

显然，建立复合索引可进一步提高效率。

create index brick_colour_weight_i on bricks ( colour, weight ) ;

再次执行查询，Buffers由46降到了14：

select /*+ gather_plan_statistics indexed */ *
from   bricks
where  colour = 'red'
and    weight = 1;

select *  from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                                                         PLAN_TABLE_OUTPUT
__________________________________________________________________________________________________________________________
SQL_ID  2v44h4xyb20hn, child number 0
-------------------------------------
select /*+ gather_plan_statistics indexed */* from   bricks where
colour = 'red' and    weight = 1

Plan hash value: 338984659

-----------------------------------------------------------------------------------------------------------------------
| Id  | Operation                           | Name                  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-----------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |                       |      1 |        |     14 |00:00:00.01 |      14 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| BRICKS                |      1 |     25 |     14 |00:00:00.01 |      14 |
|*  2 |   INDEX RANGE SCAN                  | BRICK_COLOUR_WEIGHT_I |      1 |     25 |     14 |00:00:00.01 |       3 |
-----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COLOUR"='red' AND "WEIGHT"=1)


20 rows selected.

仅使用索引的扫描

以下SQL中查询的列均在复合索引中，因此仅使用索引：

select /*+ gather_plan_statistics */weight, count(*) 
from   bricks
where  colour = 'red'
group  by weight;

select *  from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                                          PLAN_TABLE_OUTPUT
___________________________________________________________________________________________________________
SQL_ID  27z1m28bucygk, child number 0
-------------------------------------
select /*+ gather_plan_statistics */weight, count(*)  from   bricks
where  colour = 'red' group  by weight

Plan hash value: 2875908788

--------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name                  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                       |      1 |        |     10 |00:00:00.01 |       3 |
|   1 |  SORT GROUP BY NOSORT|                       |      1 |     10 |     10 |00:00:00.01 |       3 |
|*  2 |   INDEX RANGE SCAN   | BRICK_COLOUR_WEIGHT_I |      1 |    250 |    250 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("COLOUR"='red')


20 rows selected.

索引与空值

由于B-Tree索引不考虑所有索引列为空的情况，因此要利用复合索引，所有索引列中至少应有一列为非空。

例如，我们修改表定义让其允许为空：

alter table bricks
  modify ( colour null, weight null );

如果插入colour列和weight列均为NULL的1行，则之前那个SQL就无法利用索引了。

此处就不试了，以下恢复表之前的定义：

alter table bricks  modify weight not null;
alter table bricks  modify colour not null;

复合索引列顺序

索引中列的顺序对其有效性有很大影响。并且影响优化器是否能够使用它！

数据库从左到右搜索索引中的列。为了最有效，请使用索引WHERE子句的前导列。
利用对于索引列（A,B,C），where条件可以利用索引的情形可以是：

A,B,C
A,B
A

下例演示了当SQL未使用前导列时，索引无法利用的情形（Predicate Information中的filter）：

select /*+ gather_plan_statistics */colour, count(*) 
from   bricks
where  weight = 1
group  by colour;

select *  
from   table(dbms_xplan.display_cursor(format => 'IOSTATS LAST'));

SQL> select *  from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                                           PLAN_TABLE_OUTPUT
____________________________________________________________________________________________________________
SQL_ID  60u3dax0arpc7, child number 0
-------------------------------------
select /*+ gather_plan_statistics */colour, count(*)  from   bricks
where  weight = 1 group  by colour

Plan hash value: 4237521759

---------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name                  | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |                       |      1 |        |      4 |00:00:00.01 |       7 |
|   1 |  HASH GROUP BY        |                       |      1 |      4 |      4 |00:00:00.01 |       7 |
|*  2 |   INDEX FAST FULL SCAN| BRICK_COLOUR_WEIGHT_I |      1 |    100 |     72 |00:00:00.01 |       7 |
---------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("WEIGHT"=1)


20 rows selected.

虽然索引是读取数据的最快方式，但为每个查询创建一个索引会导致索引数量激增。这会增加存储需求，使优化器更难为每个查询选择最佳索引。

将创建的索引数量保持在最低限度。为关键查询保留索引！

唯一索引

创建唯一索引可避免重复数据：

create unique index brick_brick_id_u 
  on bricks ( brick_id );

insert into bricks 
  values ( 1, 'red', 'cylinder', 1, 'RED', sysdate, 'stuff' );

Error starting at line : 1 in command -
insert into bricks
  values ( 1, 'red', 'cylinder', 1, 'RED', sysdate, 'stuff' )
Error report -
ORA-00001: unique constraint (SSB.BRICK_BRICK_ID_U) violated

当查询涉及的所有索引列都为等值查询时，优化器还可以使用唯一索引执行INDEX UNIQUE SCAN，例如：

select /*+ gather_plan_statistics brick_id */* from bricks where  brick_id = 1;

select *  from   table(dbms_xplan.display_cursor(format => 'IOSTATS LAST'));

                                                                                            PLAN_TABLE_OUTPUT
_____________________________________________________________________________________________________________
SQL_ID  fabm3y8smx39f, child number 0
-------------------------------------
select /*+ gather_plan_statistics brick_id */* from bricks where
brick_id = 1

Plan hash value: 4143599005

----------------------------------------------------------------------------------------------------------
| Id  | Operation                   | Name             | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |                  |      1 |        |      Oracle开发者性能课第5课（为什么我的查询不使用索引）实验
 Oracle开发者性能课第1课（如何阅读执行计划）实验
 Oracle开发者性能课第9课（如何查找慢 SQL）实验
 Oracle开发者性能课第7课（Join如何工作）实验
 Oracle开发者性能课第8课（如何更快地进行插入更新和删除）实验
 Oracle开发者性能课第3课（我的查询做了多少工作）实验