Oracle开发者性能课第5课(为什么我的查询不使用索引)实验

Posted dingdingfish

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle开发者性能课第5课(为什么我的查询不使用索引)实验相关的知识,希望对你有一定的参考价值。

概述

本实验参考DevGym中的实验指南

创建环境

创建表bricks和索引,及全局临时表bricks_temp,并在最后搜集统计信息:

exec dbms_random.seed ( 0 );
create table bricks ( 
  brick_id not null constraint bricks_pk primary key,
  colour   not null,
  shape    not null,
  weight   not null,
  insert_date not null,
  junk     default lpad ( 'x', 50, 'x' ) not null 
) as
  with rws as (
    select level x from dual
    connect by level <= 10000
  )
    select rownum brick_id, 
           case ceil ( rownum / 2500 )
             when 4 then 'red'
             when 1 then 'blue'
             when 2 then 'green'
             when 3 then 'yellow'
           end colour, 
           case mod ( rownum, 4 )
             when 0 then 'cube'
             when 1 then 'cylinder'
             when 2 then 'pyramid'
             when 3 then 'prism'
           end shape,
           round ( dbms_random.value ( 1, 1000 ) ),
           date'2020-01-01' + ( rownum/24 ) + ( mod ( rownum, 24 ) / 36 ) insert_date,
           lpad ( ascii ( mod ( rownum, 26 ) + 65 ), 50, 'x' )
    from   rws;
    
create global temporary table bricks_temp as
  select * from bricks
  where  1 = 0;

create index brick_weight_i on 
  bricks ( weight );
  
create index brick_shape_i on 
  bricks ( shape );

create index brick_colour_i on 
  bricks ( colour );
  
create index brick_insert_date_i on 
  bricks ( insert_date );
  
exec dbms_stats.gather_table_stats ( null, 'bricks' ) ;

查看数据,bricks表有10000行:

SQL> select count(*) from bricks;

   COUNT(*)
___________
      10000


SQL>
SELECT
    brick_id,
    colour,
    shape,
    weight,
    insert_date
FROM
    bricks
WHERE
    ROWNUM <= 9;

   BRICK_ID    COLOUR       SHAPE    WEIGHT    INSERT_DATE
___________ _________ ___________ _________ ______________
          1 blue      cylinder           64 01-JAN-20
          2 blue      pyramid           829 01-JAN-20
          3 blue      prism             233 01-JAN-20
          4 blue      cube              219 01-JAN-20
          5 blue      cylinder          371 01-JAN-20
          6 blue      pyramid            70 01-JAN-20
          7 blue      prism             461 01-JAN-20
          8 blue      cube              953 01-JAN-20
          9 blue      cylinder          944 01-JAN-20

9 rows selected.

junk列长度为1000,目的是将行撑大,没有在结果集中显示。

索引什么时候有用?

大家通常认为,当索引定位表中的很少几行时,它被认为是有用的。

但很少有多少?这个很难界定。我们通过示例来了解一下。

bricks表现有5个索引:

select ui.index_name, 
       listagg ( uic.column_name, ',' )
         within group ( order by column_position ) cols
from   user_indexes ui
join   user_ind_columns uic
on     ui.index_name = uic.index_name
where  ui.table_name = 'BRICKS'
group  by ui.index_name

            INDEX_NAME           COLS
______________________ ______________
BRICKS_PK              BRICK_ID
BRICK_COLOUR_I         COLOUR
BRICK_INSERT_DATE_I    INSERT_DATE
BRICK_SHAPE_I          SHAPE
BRICK_WEIGHT_I         WEIGHT

以下查询涉及90行,只占总行数的90/10000,但使用了全表扫描:

SQL> select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
  2  from  bricks
  3  where  weight between 1 and 10;

   COUNT(DISTINCTJUNK)    COUNT(*)
______________________ ___________
                     4          90

SQL> select * from   table(dbms_xplan.display_cursor( format => 'iosTATS LAST'));

                                                                             PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID  bpjmaun36ksd1, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
from  bricks where  weight between 1 and 10

Plan hash value: 2750714649

-------------------------------------------------------------------------------------------
| Id  | Operation            | Name     | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |          |      1 |        |      1 |00:00:00.01 |     120 |
|   1 |  SORT AGGREGATE      |          |      1 |      1 |      1 |00:00:00.01 |     120 |
|   2 |   VIEW               | VW_DAG_0 |      1 |      4 |      4 |00:00:00.01 |     120 |
|   3 |    HASH GROUP BY     |          |      1 |      4 |      4 |00:00:00.01 |     120 |
|*  4 |     TABLE ACCESS FULL| BRICKS   |      1 |    100 |     90 |00:00:00.01 |     120 |
-------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("WEIGHT"<=10 AND "WEIGHT">=1))


22 rows selected.

以下查询涉及1000行,反而使用了索引:

SQL> select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
  2  from  bricks
  3  where  brick_id between 1 and 1000;

   COUNT(DISTINCTJUNK)    COUNT(*)
______________________ ___________
                     4        1000

SQL> select * from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                                                PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________
SQL_ID  8dd30a6d69jfv, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
from  bricks where  brick_id between 1 and 1000

Plan hash value: 3219706874

--------------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |           |      1 |        |      1 |00:00:00.01 |      15 |
|   1 |  SORT AGGREGATE                        |           |      1 |      1 |      1 |00:00:00.01 |      15 |
|   2 |   VIEW                                 | VW_DAG_0  |      1 |      4 |      4 |00:00:00.01 |      15 |
|   3 |    HASH GROUP BY                       |           |      1 |      4 |      4 |00:00:00.01 |      15 |
|   4 |     TABLE ACCESS BY INDEX ROWID BATCHED| BRICKS    |      1 |   1000 |   1000 |00:00:00.01 |      15 |
|*  5 |      INDEX RANGE SCAN                  | BRICKS_PK |      1 |   1000 |   1000 |00:00:00.01 |       3 |
--------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("BRICK_ID">=1 AND "BRICK_ID"<=1000)


23 rows selected.

物理行位置

Oracle数据库将行存于数据库中。您可以使用DBMS_rowid查找行的块号。例如:

SQL> select brick_id,
  2         dbms_rowid.rowid_block_number ( rowid ) blk#
  3  from   bricks
  4  where  mod ( brick_id, 1000 ) = 0;

   BRICK_ID      BLK#
___________ _________
       1000    280110
       2000    280123
       3000    280135
       4000    280148
       5000    275921
       6000    275933
       7000    275946
       8000    275958
       9000    278019
      10000    278030

10 rows selected.

默认情况下,Oracle数据库中的表时堆表(Heap table)。这意味着数据库可以将行放在任何地方。

但是索引是有序的数据结构。新条目必须放在正确的位置。例如,如果在数字列中插入42,则在该列位于41之后,或43之前。

行的物理顺序与索引的逻辑顺序越接近,该索引就越有效。

Oracle数据库中最小的I/O单元是数据块。因此,指向同一数据库块的连续索引项越多,在一个I/O中获取的行就越多。因此,索引就越有效。

下面这个SQL比较难理解,但也是本文中最精彩的部分。同时用到了分析函数和Pivot转换:

with rws as ( 
  select ceil ( brick_id / 1000 ) id, 
         ceil ( 
           dense_rank () over (
             order by dbms_rowid.rowid_block_number ( rowid )
           ) / 10 
         ) rid
  from   bricks
)
  select * from rws
  pivot (
    count (*) for rid in (
      1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
    )
  )
  order by id;

首先看一下其中的子查询rws:

  select ceil ( brick_id / 1000 ) id, 
         ceil ( 
           dense_rank () over (
             order by dbms_rowid.rowid_block_number ( rowid )
           ) / 10 
         ) rid
  from   bricks

由于bricks表有10000行,因此这个查询的结果也有10000行,但为了方便最终显示,id列和rid列都进行了分组。id被分为10段,也就是1-1000, 1001-2000,…,9001-10000。rid被分为12段,也就是按dense_rank排序后,从1-10,11-20,…,120-130。

其实这里有一个隐含条件没有说,就是此表只有128个数据块:

SQL> select blocks from user_segments where segment_name = 'BRICKS';

   BLOCKS
_________
      128

以上子查询再经过pivot的count(*)计数得到以下的结果,横向是数据块的分段,纵向是行的分段。此图可以看出数据的分布(聚集度或分散度):

   ID      1      2      3      4      5      6      7      8      9     10     11     12
_____ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______
    1      0      0      0      0      0     87    860     53      0      0      0      0
    2      0      0      0      0      0      0      0    807    193      0      0      0
    3      0      0      0      0      0      0      0      0    665    335      0      0
    4      0      0      0      0      0      0      0      0      0    515    485      0
    5     40      0      0      0      0      0      0      0      0      0    365    595
    6    800    200      0      0      0      0      0      0      0      0      0      0
    7      0    640    360      0      0      0      0      0      0      0      0      0
    8      0      0    480    520      0      0      0      0      0      0      0      0
    9      0      0      0    349    651      0      0      0      0      0      0      0
   10      0      0      0      0    219    781      0      0      0      0      0      0

10 rows selected.

例如对于ID 1那行(对应表中的第1-1000行),所有行聚集与6-8段。这3个数加起来正好等于1000(87+860+53)。剩下的就不解释了。

以上是从brick_id的视角,而从weight的视角(有1000个不同值,所以它是除100,不是之前的1000),则其分布如下。分布比较分散:

SQL>
with rws as ( 
  select ceil ( weight / 100 ) wt, 
         ceil ( 
           dense_rank () over (
             order by dbms_rowid.rowid_block_number ( rowid )
           ) / 10 
         ) rid
  from   bricks
)
  select * from rws
  pivot (
    count (*) for rid in (
      1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
    )
  )
  order by wt;

   WT     1     2      3     4      5     6      7     8     9    10     11    12
_____ _____ _____ ______ _____ ______ _____ ______ _____ _____ _____ ______ _____
    1    75    79     87    88     96    84    107    75    87    93    100    70
    2    90    92     85    85     82    91    107    83    89    97     84    61
    3    74    78     87    83    101    83     78    87    85    87     88    50
    4    94    99    102    82     78    93     83    95    87    83     80    56
    5    89    72     79    86     90    89     68    86    86    92     87    63
    6    95    89     89    91     99    78     90    76    80    81     79    57
    7    74    81     79    90     87    88     69    93    84    81     85    61
    8    67    74     72    73     76    99     85    94    88    88     84    63
    9    90    88     81    94     85    77     92    84    89    69     87    54
   10    92    88     79    97     76    86     81    87    83    79     76    60

10 rows selected.

SQL>
select count(distinct weight) from bricks;
COUNT(DISTINCTWEIGHT)
---------------------
                 1000

这意味着和brick_id相比,通过weight获取同样行数的数据,数据库必须进行更多的I/O操作。因此,基于weight的索引不如基于brick_id的索引有效。

真佩服作者设计出这样的分布。这种分布实际上是由于brick_id是递增顺序插入,而weight是用随机数生成的(dbms_random.value ( 1, 1000 ))。

因此,准确的说,在确定索引的效率时,重要的是I/O操作的数量(访问数据库的次数)。不是访问多少行!

那么优化器如何知道逻辑顺序和物理顺序的匹配程度呢?它使用聚集因子(clustering factor)进行估计。

聚集因子(clustering factor)

聚集因子是衡量逻辑索引顺序与行的物理表顺序匹配程度的指标。数据库在收集统计数据时计算此值。它的计算基于:
当前索引项对应的行与上一个索引项对应的行在同一块中,还是不同的块中?

每次连续索引项位于不同的块中时,优化器都会将计数器加1。最终该值越低,行的聚集性越好,数据库使用索引的可能性越大。

聚集因子可如下查看,可以看出,BRICK_WEIGHT_I的聚集因子远高于BRICKS_PK的聚集因子。从另一个角度,如果CLUSTERING_FACTOR和BLOCKS数值接近,则表示聚集性越好:

select index_name, clustering_factor, ut.num_rows, ut.blocks
from   user_indexes ui
join   user_tables ut
on     ui.table_name = ut.table_name
where  ui.table_name = 'BRICKS';

            INDEX_NAME    CLUSTERING_FACTOR    NUM_ROWS    BLOCKS
______________________ ____________________ ___________ _________
BRICKS_PK                               117       10000       127
BRICK_WEIGHT_I                         9572       10000       127
BRICK_SHAPE_I                           468       10000       127
BRICK_COLOUR_I                          120       10000       127
BRICK_INSERT_DATE_I                     877       10000       127

获取部分聚集的行

部分聚集(Partly Clustered)指聚集因子比较“平均”。此时数据库倾向于全表扫描:

SQL> select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
  2  from  bricks
  3  where  insert_date >= date'2020-02-01'
  4* and    insert_date < date'2020-02-21';
   COUNT(DISTINCTJUNK)    COUNT(*)
______________________ ___________
                     4         480

SQL> select * from   table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));

                                                                             PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID  0h7f4s1twvqkw, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
from  bricks where  insert_date >= date'2020-02-01' and    insert_date
< date'2020-02-21'

Plan hash value: 2750714649

-------------------------------------------------------------------------------------------以上是关于Oracle开发者性能课第5课(为什么我的查询不使用索引)实验的主要内容,如果未能解决你的问题,请参考以下文章

Oracle开发者性能课第6课(如何创建物化视图)实验

Oracle开发者性能课第4课(如何创建索引)实验

Oracle开发者性能课第1课(如何阅读执行计划)实验

Oracle开发者性能课第9课(如何查找慢 SQL)实验

Oracle开发者性能课第7课(Join如何工作)实验

Oracle开发者性能课第8课(如何更快地进行插入更新和删除)实验