索引原理与慢查询优化
Posted 孟庆健
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了索引原理与慢查询优化相关的知识,希望对你有一定的参考价值。
一、MySQL索引管理
1、功能
(1). 索引的功能就是加速查找
(2). mysql中的primary key,unique,联合唯一也都是索引,这些索引除了加速查找以外,还有约束的功能
普通索引INDEX:加速查 唯一索引: -主键索引PRIMARY KEY:加速查找+约束(不为空、不能重复) -唯一索引UNIQUE:加速查找+约束(不能重复) 联合索引: -PRIMARY KEY(id,name):联合主键索引 -UNIQUE(id,name):联合唯一索引 -INDEX(id,name):联合普通索引二、索引数据结构
1. 索引字段要尽量的小:通过上面的分析,我们知道IO次数取决于b+数的高度h,
假设当前数据表的数据为N,每个磁盘块的数据项的数量是m,则有h=㏒(m+1)N,
当数据量N一定的情况下,m越大,h越小;而m = 磁盘块的大小 / 数据项的大小,
磁盘块的大小也就是一个数据页的大小,是固定的,如果数据项占的空间越小,
数据项的数量越多,树的高度越低。这就是为什么每个数据项,即索引字段要尽量的小,
比如int占4字节,要比bigint8字节少一半。这也是为什么b+树要求把真实的数据放到叶子节点而不是内层节点,
一旦放到内层节点,磁盘块的数据项会大幅度下降,导致树增高。当数据项等于1时将会退化成线性表。
2. 索引的最左匹配特性:当b+树的数据项是复合的数据结构,比如(name,age,sex)的时候,
b+数是按照从左到右的顺序来建立搜索树的,比如当(张三,20,F)这样的数据来检索的时候,
b+树会优先比较name来确定下一步的所搜方向,如果name相同再依次比较age和sex,
最后得到检索的数据;但当(20,F)这样的没有name的数据来的时候,b+树就不知道下一步该查哪个节点,
因为建立搜索树的时候name就是第一个比较因子,必须要先根据name来搜索才能知道下一步去哪里查询。
比如当(张三,F)这样的数据来检索时,b+树可以用name来指定搜索方向,但下一个字段age的缺失,
所以只能把名字等于张三的数据都找到,然后再匹配性别是F的数据了, 这个是非常重要的性质,即索引的最左匹配特性。
三、 创建/删除索引的语法
#方法一:创建表时 CREATE TABLE 表名 ( 字段名1 数据类型 [完整性约束条件…], 字段名2 数据类型 [完整性约束条件…], [UNIQUE | FULLTEXT | SPATIAL ] INDEX | KEY [索引名] (字段名[(长度)] [ASC |DESC]) ); #方法二:CREATE在已存在的表上创建索引 CREATE [UNIQUE | FULLTEXT | SPATIAL ] INDEX 索引名 ON 表名 (字段名[(长度)] [ASC |DESC]) ; #方法三:ALTER TABLE在已存在的表上创建索引 ALTER TABLE 表名 ADD [UNIQUE | FULLTEXT | SPATIAL ] INDEX 索引名 (字段名[(长度)] [ASC |DESC]) ; #删除索引:DROP INDEX 索引名 ON 表名字;
1 创建索引
- 在创建表时就创建
create table s1(
id int,
name char(6),
age int,
email varchar(30),
index(id)
);
- 在创建表后创建
create index name on s1(name);#添加普通索引
create unique index age on s1(age);#添加唯一索引
alter table s1 add primary key(id);#添加主键索引
create index name on s1(id,name);#添加联合普通索引
2 删除索引
drop index id on s1;
drop index name on s1;
alter table s1 drop primary key;#添加主键索引
四、 测试索引
1、
#1. 准备表 create table s1( id int, name varchar(20), gender char(6), email varchar(50) ); #2. 创建存储过程,实现批量插入记录 delimiter $$ #声明存储过程的结束符号为$$ create procedure auto_insert1() BEGIN declare i int default 1; while(i<300000)do insert into s1 values(i,concat(\'egon\',i),\'male\',concat(\'egon\',i,\'@oldboy\')); set i=i+1; end while; END$$ delimiter ; #重新声明分号为结束符号 #3. 查看存储过程 show create procedure auto_insert1\\G #4. 调用存储过程 call auto_insert1();
2 、在没有索引的前提下测试查询速度
#无索引:从头到尾扫描一遍,所以查询速度很慢
加上索引
五、
1、若想利用索引达到预想的提高查询速度的效果,我们在添加索引时,必须遵循以下原则
#1.最左前缀匹配原则,非常重要的原则, create index ix_name_email on s1(name,email,) - 最左前缀匹配:必须按照从左到右的顺序匹配 select * from s1 where name=\'egon\'; #可以 select * from s1 where name=\'egon\' and email=\'asdf\'; #可以 select * from s1 where email=\'alex@oldboy.com\'; #不可以 mysql会一直向右匹配直到遇到范围查询(>、<、between、like)就停止匹配,比如a = 1 and b = 2 and c > 3 and d = 4 如果建立(a,b,c,d)顺序的索引,d是用不到索引的,如果建立(a,b,d,c)的索引则都可以用到,a,b,d的顺序可以任意调整。 #2.=和in可以乱序,比如a = 1 and b = 2 and c = 3 建立(a,b,c)索引可以任意顺序,mysql的查询优化器会帮你优化成索引可以识别的形式 #3.尽量选择区分度高的列作为索引,区分度的公式是count(distinct col)/count(*),表示字段不重复的比例,比例越大我们扫描的记录数越少,唯一键的区分度是1,而一些状态、性别字段可能在大数据面前区分度就是0,那可能有人会问,这个比例有什么经验值吗?使用场景不同,这个值也很难确定,一般需要join的字段我们都要求是0.1以上,即平均1条扫描10条记录 #4.索引列不能参与计算,保持列“干净”,比如from_unixtime(create_time) = ’2014-05-29’就不能使用到索引,原因很简单,b+树中存的都是数据表中的字段值,但进行检索时,需要把所有元素都应用函数才能比较,显然成本太大。所以语句应该写成create_time = unix_timestamp(’2014-05-29’);
2、最左前缀示范
1 加索引提速:范围 mysql> select count(*) from s1 where id=1000; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.12 sec) mysql> select count(*) from s1 where id>1000; +----------+ | count(*) | +----------+ | 298999 | +----------+ 1 row in set (0.12 sec) mysql> create index a on s1(id) -> ; Query OK, 0 rows affected (3.21 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from s1 where id=1000; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where id>1000; +----------+ | count(*) | +----------+ | 298999 | +----------+ 1 row in set (0.12 sec) mysql> select count(*) from s1 where id>1000 and id < 2000; +----------+ | count(*) | +----------+ | 999 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where id>1000 and id < 300000; +----------+ | count(*) | +----------+ | 298999 | +----------+ 1 row in set (0.13 sec) 3 区分度低的字段不能加索引 mysql> select count(*) from s1 where name=\'xxx\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where name=\'egon\'; +----------+ | count(*) | +----------+ | 299999 | +----------+ 1 row in set (0.19 sec) mysql> select count(*) from s1 where name=\'egon\' and age=123123123123123; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.45 sec) mysql> create index c on s1(age); Query OK, 0 rows affected (3.03 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from s1 where name=\'egon\' and age=123123123123123; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where name=\'egon\' and age=10; +----------+ | count(*) | +----------+ | 299999 | +----------+ 1 row in set (0.35 sec) mysql> select count(*) from s1 where name=\'egon\' and age=10 and id>3000 and id < 4000; +----------+ | count(*) | +----------+ | 999 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where name=\'egon\' and age=10 and id>3000 and email=\'xxxx\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.47 sec) mysql> create index d on s1(email); Query OK, 0 rows affected (4.83 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from s1 where name=\'egon\' and age=10 and id>3000 and email=\'xxxx\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> drop index a on s1; Query OK, 0 rows affected (0.10 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> drop index b on s1; Query OK, 0 rows affected (0.09 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> drop index c on s1; Query OK, 0 rows affected (0.09 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> desc s1; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | id | int(11) | NO | | NULL | | | name | char(20) | YES | | NULL | | | age | int(11) | YES | | NULL | | | email | varchar(30) | YES | MUL | NULL | | +-------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) mysql> select count(*) from s1 where name=\'egon\' and age=10 and id>3000 and email=\'xxxx\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) 5 增加联合索引,关于范围查询的字段要放到后面 select count(*) from s1 where name=\'egon\' and age=10 and id>3000 and email=\'xxxx\'; index(name,email,age,id) select count(*) from s1 where name=\'egon\' and age> 10 and id=3000 and email=\'xxxx\'; index(name,email,id,age) select count(*) from s1 where name like \'egon\' and age= 10 and id=3000 and email=\'xxxx\'; index(email,id,age,name) mysql> desc s1; +-------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +-------+-------------+------+-----+---------+-------+ | id | int(11) | NO | | NULL | | | name | char(20) | YES | | NULL | | | age | int(11) | YES | | NULL | | | email | varchar(30) | YES | | NULL | | +-------+-------------+------+-----+---------+-------+ 4 rows in set (0.00 sec) mysql> create index xxx on s1(age,email,name,id); Query OK, 0 rows affected (6.89 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from s1 where name=\'egon\' and age=10 and id>3000 and email=\'xxxx\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) 6. 最左前缀匹配 index(id,age,email,name) #条件中一定要出现id id id age id email id name email #不行 mysql> select count(*) from s1 where id=3000; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.11 sec) mysql> create index xxx on s1(id,name,age,email); Query OK, 0 rows affected (6.44 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from s1 where id=3000; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where name=\'egon\'; +----------+ | count(*) | +----------+ | 299999 | +----------+ 1 row in set (0.16 sec) mysql> select count(*) from s1 where email=\'egon3333@oldboy.com\'; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.15 sec) mysql> select count(*) from s1 where id=1000 and email=\'egon3333@oldboy.com\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where email=\'egon3333@oldboy.com\' and id=3000; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) 6.索引列不能参与计算,保持列“干净” mysql> select count(*) from s1 where id=3000; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.11 sec) mysql> create index xxx on s1(id,name,age,email); Query OK, 0 rows affected (6.44 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> select count(*) from s1 where id=3000; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where name=\'egon\'; +----------+ | count(*) | +----------+ | 299999 | +----------+ 1 row in set (0.16 sec) mysql> select count(*) from s1 where email=\'egon3333@oldboy.com\'; +----------+ | count(*) | +----------+ | 1 | +----------+ 1 row in set (0.15 sec) mysql> select count(*) from s1 where id=1000 and email=\'egon3333@oldboy.com\'; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec) mysql> select count(*) from s1 where email=\'egon3333@oldboy.com\' and id=3000; +----------+ | count(*) | +----------+ | 0 | +----------+ 1 row in set (0.00 sec)
其他注意事项
- 避免使用select * - count(1)或count(列) 代替 count(*) - 创建表时尽量时 char 代替 varchar - 表的字段顺序固定长度的字段优先 - 组合索引代替多个单列索引(经常使用多个条件查询时) - 尽量使用短索引 - 使用连接(JOIN)来代替子查询(Sub-Queries) - 连表时注意条件类型需一致 - 索引散列值(重复少)不适合建索引,例:性别不适合
以上是关于索引原理与慢查询优化的主要内容,如果未能解决你的问题,请参考以下文章