Oracle开发者性能课第5课(为什么我的查询不使用索引)实验
Posted dingdingfish
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Oracle开发者性能课第5课(为什么我的查询不使用索引)实验相关的知识,希望对你有一定的参考价值。
概述
本实验参考DevGym中的实验指南。
创建环境
创建表bricks和索引,及全局临时表bricks_temp,并在最后搜集统计信息:
exec dbms_random.seed ( 0 );
create table bricks (
brick_id not null constraint bricks_pk primary key,
colour not null,
shape not null,
weight not null,
insert_date not null,
junk default lpad ( 'x', 50, 'x' ) not null
) as
with rws as (
select level x from dual
connect by level <= 10000
)
select rownum brick_id,
case ceil ( rownum / 2500 )
when 4 then 'red'
when 1 then 'blue'
when 2 then 'green'
when 3 then 'yellow'
end colour,
case mod ( rownum, 4 )
when 0 then 'cube'
when 1 then 'cylinder'
when 2 then 'pyramid'
when 3 then 'prism'
end shape,
round ( dbms_random.value ( 1, 1000 ) ),
date'2020-01-01' + ( rownum/24 ) + ( mod ( rownum, 24 ) / 36 ) insert_date,
lpad ( ascii ( mod ( rownum, 26 ) + 65 ), 50, 'x' )
from rws;
create global temporary table bricks_temp as
select * from bricks
where 1 = 0;
create index brick_weight_i on
bricks ( weight );
create index brick_shape_i on
bricks ( shape );
create index brick_colour_i on
bricks ( colour );
create index brick_insert_date_i on
bricks ( insert_date );
exec dbms_stats.gather_table_stats ( null, 'bricks' ) ;
查看数据,bricks表有10000行:
SQL> select count(*) from bricks;
COUNT(*)
___________
10000
SQL>
SELECT
brick_id,
colour,
shape,
weight,
insert_date
FROM
bricks
WHERE
ROWNUM <= 9;
BRICK_ID COLOUR SHAPE WEIGHT INSERT_DATE
___________ _________ ___________ _________ ______________
1 blue cylinder 64 01-JAN-20
2 blue pyramid 829 01-JAN-20
3 blue prism 233 01-JAN-20
4 blue cube 219 01-JAN-20
5 blue cylinder 371 01-JAN-20
6 blue pyramid 70 01-JAN-20
7 blue prism 461 01-JAN-20
8 blue cube 953 01-JAN-20
9 blue cylinder 944 01-JAN-20
9 rows selected.
junk列长度为1000,目的是将行撑大,没有在结果集中显示。
索引什么时候有用?
大家通常认为,当索引定位表中的很少几行时,它被认为是有用的。
但很少有多少?这个很难界定。我们通过示例来了解一下。
bricks表现有5个索引:
select ui.index_name,
listagg ( uic.column_name, ',' )
within group ( order by column_position ) cols
from user_indexes ui
join user_ind_columns uic
on ui.index_name = uic.index_name
where ui.table_name = 'BRICKS'
group by ui.index_name
INDEX_NAME COLS
______________________ ______________
BRICKS_PK BRICK_ID
BRICK_COLOUR_I COLOUR
BRICK_INSERT_DATE_I INSERT_DATE
BRICK_SHAPE_I SHAPE
BRICK_WEIGHT_I WEIGHT
以下查询涉及90行,只占总行数的90/10000,但使用了全表扫描:
SQL> select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
2 from bricks
3 where weight between 1 and 10;
COUNT(DISTINCTJUNK) COUNT(*)
______________________ ___________
4 90
SQL> select * from table(dbms_xplan.display_cursor( format => 'iosTATS LAST'));
PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID bpjmaun36ksd1, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
from bricks where weight between 1 and 10
Plan hash value: 2750714649
-------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 120 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 120 |
| 2 | VIEW | VW_DAG_0 | 1 | 4 | 4 |00:00:00.01 | 120 |
| 3 | HASH GROUP BY | | 1 | 4 | 4 |00:00:00.01 | 120 |
|* 4 | TABLE ACCESS FULL| BRICKS | 1 | 100 | 90 |00:00:00.01 | 120 |
-------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - filter(("WEIGHT"<=10 AND "WEIGHT">=1))
22 rows selected.
以下查询涉及1000行,反而使用了索引:
SQL> select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
2 from bricks
3 where brick_id between 1 and 1000;
COUNT(DISTINCTJUNK) COUNT(*)
______________________ ___________
4 1000
SQL> select * from table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
_________________________________________________________________________________________________________________
SQL_ID 8dd30a6d69jfv, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
from bricks where brick_id between 1 and 1000
Plan hash value: 3219706874
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 1 |00:00:00.01 | 15 |
| 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.01 | 15 |
| 2 | VIEW | VW_DAG_0 | 1 | 4 | 4 |00:00:00.01 | 15 |
| 3 | HASH GROUP BY | | 1 | 4 | 4 |00:00:00.01 | 15 |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| BRICKS | 1 | 1000 | 1000 |00:00:00.01 | 15 |
|* 5 | INDEX RANGE SCAN | BRICKS_PK | 1 | 1000 | 1000 |00:00:00.01 | 3 |
--------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
5 - access("BRICK_ID">=1 AND "BRICK_ID"<=1000)
23 rows selected.
物理行位置
Oracle数据库将行存于数据库中。您可以使用DBMS_rowid查找行的块号。例如:
SQL> select brick_id,
2 dbms_rowid.rowid_block_number ( rowid ) blk#
3 from bricks
4 where mod ( brick_id, 1000 ) = 0;
BRICK_ID BLK#
___________ _________
1000 280110
2000 280123
3000 280135
4000 280148
5000 275921
6000 275933
7000 275946
8000 275958
9000 278019
10000 278030
10 rows selected.
默认情况下,Oracle数据库中的表时堆表(Heap table)。这意味着数据库可以将行放在任何地方。
但是索引是有序的数据结构。新条目必须放在正确的位置。例如,如果在数字列中插入42,则在该列位于41之后,或43之前。
行的物理顺序与索引的逻辑顺序越接近,该索引就越有效。
Oracle数据库中最小的I/O单元是数据块。因此,指向同一数据库块的连续索引项越多,在一个I/O中获取的行就越多。因此,索引就越有效。
下面这个SQL比较难理解,但也是本文中最精彩的部分。同时用到了分析函数和Pivot转换:
with rws as (
select ceil ( brick_id / 1000 ) id,
ceil (
dense_rank () over (
order by dbms_rowid.rowid_block_number ( rowid )
) / 10
) rid
from bricks
)
select * from rws
pivot (
count (*) for rid in (
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
)
)
order by id;
首先看一下其中的子查询rws:
select ceil ( brick_id / 1000 ) id,
ceil (
dense_rank () over (
order by dbms_rowid.rowid_block_number ( rowid )
) / 10
) rid
from bricks
由于bricks表有10000行,因此这个查询的结果也有10000行,但为了方便最终显示,id列和rid列都进行了分组。id被分为10段,也就是1-1000, 1001-2000,…,9001-10000。rid被分为12段,也就是按dense_rank排序后,从1-10,11-20,…,120-130。
其实这里有一个隐含条件没有说,就是此表只有128个数据块:
SQL> select blocks from user_segments where segment_name = 'BRICKS';
BLOCKS
_________
128
以上子查询再经过pivot的count(*)计数得到以下的结果,横向是数据块的分段,纵向是行的分段。此图可以看出数据的分布(聚集度或分散度):
ID 1 2 3 4 5 6 7 8 9 10 11 12
_____ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______ ______
1 0 0 0 0 0 87 860 53 0 0 0 0
2 0 0 0 0 0 0 0 807 193 0 0 0
3 0 0 0 0 0 0 0 0 665 335 0 0
4 0 0 0 0 0 0 0 0 0 515 485 0
5 40 0 0 0 0 0 0 0 0 0 365 595
6 800 200 0 0 0 0 0 0 0 0 0 0
7 0 640 360 0 0 0 0 0 0 0 0 0
8 0 0 480 520 0 0 0 0 0 0 0 0
9 0 0 0 349 651 0 0 0 0 0 0 0
10 0 0 0 0 219 781 0 0 0 0 0 0
10 rows selected.
例如对于ID 1那行(对应表中的第1-1000行),所有行聚集与6-8段。这3个数加起来正好等于1000(87+860+53)。剩下的就不解释了。
以上是从brick_id的视角,而从weight的视角(有1000个不同值,所以它是除100,不是之前的1000),则其分布如下。分布比较分散:
SQL>
with rws as (
select ceil ( weight / 100 ) wt,
ceil (
dense_rank () over (
order by dbms_rowid.rowid_block_number ( rowid )
) / 10
) rid
from bricks
)
select * from rws
pivot (
count (*) for rid in (
1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12
)
)
order by wt;
WT 1 2 3 4 5 6 7 8 9 10 11 12
_____ _____ _____ ______ _____ ______ _____ ______ _____ _____ _____ ______ _____
1 75 79 87 88 96 84 107 75 87 93 100 70
2 90 92 85 85 82 91 107 83 89 97 84 61
3 74 78 87 83 101 83 78 87 85 87 88 50
4 94 99 102 82 78 93 83 95 87 83 80 56
5 89 72 79 86 90 89 68 86 86 92 87 63
6 95 89 89 91 99 78 90 76 80 81 79 57
7 74 81 79 90 87 88 69 93 84 81 85 61
8 67 74 72 73 76 99 85 94 88 88 84 63
9 90 88 81 94 85 77 92 84 89 69 87 54
10 92 88 79 97 76 86 81 87 83 79 76 60
10 rows selected.
SQL>
select count(distinct weight) from bricks;
COUNT(DISTINCTWEIGHT)
---------------------
1000
这意味着和brick_id相比,通过weight获取同样行数的数据,数据库必须进行更多的I/O操作。因此,基于weight的索引不如基于brick_id的索引有效。
真佩服作者设计出这样的分布。这种分布实际上是由于brick_id是递增顺序插入,而weight是用随机数生成的(dbms_random.value ( 1, 1000 )
)。
因此,准确的说,在确定索引的效率时,重要的是I/O操作的数量(访问数据库的次数)。不是访问多少行!
那么优化器如何知道逻辑顺序和物理顺序的匹配程度呢?它使用聚集因子(clustering factor)进行估计。
聚集因子(clustering factor)
聚集因子是衡量逻辑索引顺序与行的物理表顺序匹配程度的指标。数据库在收集统计数据时计算此值。它的计算基于:
当前索引项对应的行与上一个索引项对应的行在同一块中,还是不同的块中?
每次连续索引项位于不同的块中时,优化器都会将计数器加1。最终该值越低,行的聚集性越好,数据库使用索引的可能性越大。
聚集因子可如下查看,可以看出,BRICK_WEIGHT_I的聚集因子远高于BRICKS_PK的聚集因子。从另一个角度,如果CLUSTERING_FACTOR和BLOCKS数值接近,则表示聚集性越好:
select index_name, clustering_factor, ut.num_rows, ut.blocks
from user_indexes ui
join user_tables ut
on ui.table_name = ut.table_name
where ui.table_name = 'BRICKS';
INDEX_NAME CLUSTERING_FACTOR NUM_ROWS BLOCKS
______________________ ____________________ ___________ _________
BRICKS_PK 117 10000 127
BRICK_WEIGHT_I 9572 10000 127
BRICK_SHAPE_I 468 10000 127
BRICK_COLOUR_I 120 10000 127
BRICK_INSERT_DATE_I 877 10000 127
获取部分聚集的行
部分聚集(Partly Clustered)指聚集因子比较“平均”。此时数据库倾向于全表扫描:
SQL> select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
2 from bricks
3 where insert_date >= date'2020-02-01'
4* and insert_date < date'2020-02-21';
COUNT(DISTINCTJUNK) COUNT(*)
______________________ ___________
4 480
SQL> select * from table(dbms_xplan.display_cursor( format => 'IOSTATS LAST'));
PLAN_TABLE_OUTPUT
______________________________________________________________________________________________
SQL_ID 0h7f4s1twvqkw, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count ( distinct junk ), count (*)
from bricks where insert_date >= date'2020-02-01' and insert_date
< date'2020-02-21'
Plan hash value: 2750714649
-------------------------------------------------------------------------------------------以上是关于Oracle开发者性能课第5课(为什么我的查询不使用索引)实验的主要内容,如果未能解决你的问题,请参考以下文章