五、如何建立合适的索引

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了五、如何建立合适的索引相关的知识,希望对你有一定的参考价值。

参考技术A 1.查询频繁

2.区分度高(离散度)

3.长度小

4.尽可能覆盖常用的查询字段

区分度高(离散度):100万用户,性别基本为男/女各50万,区分度就很低了.

索引长度直接影响索引文件的大小,影响增删改的速度,并间接影响查询速度(占用内存多)。联合索引中,区分度大的字段放前面

对于一些较长但又需要经常查询的字段,可以截取前面部分作为索引

例子:比如成语表的成语字段长度为4-20左右。

针对该列中的值,从左往右截取部分来建索引

1:截的越短,重复度越高,区分度越小,索引效果越不好

2:截取的越长,重复度越低,区分度越高,索引效果越好,但是索引长度越大,带来的影响也越大-增删改慢,间接影响查询。

所以,需要再区分度+长度 两者取一个平衡。

方法:截取不同长度,并测试区分度,取一个合适的长度

select (  (select count(distinct left(`word`,1)) from dict)/( select count(*) from dict)  )

select (  (select count(distinct left(`word`,2)) from dict)/( select count(*) from dict)  )

select (  (select count(distinct left(`word`,3)) from dict)/( select count(*) from dict)  )

因为innodb 的左前缀原则,xxx%有效,而%xxx则无效。

对于左前缀区分度不高的字段,该如何建立索引

比如网址,前缀都是http://www

http://www.baidu.com

http://www.php.cn

http://www.w3school.com

技巧一:

将数据反过来存储,比如moc.udiab.www:ptth;

技巧二:

使用伪哈希

添加一个url_crc32字段,

使用crc32算法将网址转为整形存储,查询的时候查询该网址的crc32值。

crc32是一种哈希算法,能把字符串算为32为整数。

crc32的计算结果可能有重复,但是概率不高,可以在查询后再做相应过滤

多列索引

考虑因素,列的查询效率,区分度,同时还要结合具体业务。

MySQL如何选择合适的索引

先来看一个栗子

EXPLAIN select * from employees where name > 'a';

技术图片

如果用name索引查找数据需要遍历name字段联合索引树,然后根据遍历出来的主键值去主键索引树里再去查出最终数据,成本比全表扫描还高。
可以用覆盖索引优化,这样只需要遍历name字段的联合索引树就可以拿到所有的结果。

EXPLAIN select name,age,position from employees where name > 'a';

技术图片

可以看到通过select出的字段是覆盖索引,MySQL底层使用了索引优化。
在看另一个case:

EXPLAIN select * from employees where name > 'zzz';

技术图片

对于上面的这两种 name>‘a‘ 和 name>‘zzz‘的执行结果, mysql最终是否选择走索引或者一张表涉及多个索引, mysql最终如何选择索引,可以通过trace工具来一查究竟,开启trace工具会影响mysql性能,所以只能临时分析sql使用,用完之后需要立即关闭。

SET SESSION optimizer_trace="enabled=on",end_markers_in_json=on;  --开启trace
SELECT * FROM employees WHERE name > 'a' ORDER BY position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;

查看trace字段:

  "steps": [
    
      "join_preparation":   --第一阶段:SQl准备阶段
        "select#": 1,
        "steps": [
          
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'a') order by `employees`.`position`"
          
        ] /* steps */
       /* join_preparation */
    ,
    
      "join_optimization":  --第二阶段:SQL优化阶段
        "select#": 1,
        "steps": [
          
            "condition_processing":  --条件处理
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'a')",
              "steps": [
                
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                ,
                
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                ,
                
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'a')"
                
              ] /* steps */
             /* condition_processing */
          ,
          
            "table_dependencies": [  --表依赖详情
              
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              
            ] /* table_dependencies */
          ,
          
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          ,
          
            "rows_estimation": [  --预估标的访问成本
              
                "table": "`employees`",
                "range_analysis": 
                  "table_scan":  --全表扫描情况
                    "rows": 3,  --扫描行数
                    "cost": 3.7  --查询成本
                   /* table_scan */,
                  "potential_range_indices": [  --查询可能使用的索引
                    
                      "index": "PRIMARY", --主键索引
                      "usable": false,
                      "cause": "not_applicable"
                    ,
                    
                      "index": "idx_name_age_position",  --辅助索引
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    ,
                    
                      "index": "idx_age",
                      "usable": false,
                      "cause": "not_applicable"
                    
                  ] /* potential_range_indices */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": 
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                   /* group_index_range */,
                  "analyzing_range_alternatives":   ‐‐分析各个索引使用成本
                    "range_scan_alternatives": [
                      
                        "index": "idx_name_age_position",
                        "ranges": [
                          "a < name"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,  ‐‐是否使用覆盖索引
                        "rows": 3,  --‐‐索引扫描行数
                        "cost": 4.61,  --索引使用成本
                        "chosen": false,  ‐‐是否选择该索引
                        "cause": "cost"
                      
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": 
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                     /* analyzing_roworder_intersect */
                   /* analyzing_range_alternatives */
                 /* range_analysis */
              
            ] /* rows_estimation */
          ,
          
            "considered_execution_plans": [
              
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                "best_access_path": 
                  "considered_access_paths": [
                    
                      "access_type": "scan",
                      "rows": 3,
                      "cost": 1.6,
                      "chosen": true,
                      "use_tmp_table": true
                    
                  ] /* considered_access_paths */
                 /* best_access_path */,
                "cost_for_plan": 1.6,
                "rows_for_plan": 3,
                "sort_cost": 3,
                "new_cost_for_plan": 4.6,
                "chosen": true
              
            ] /* considered_execution_plans */
          ,
          
            "attaching_conditions_to_tables": 
              "original_condition": "(`employees`.`name` > 'a')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'a')"
                
              ] /* attached_conditions_summary */
             /* attaching_conditions_to_tables */
          ,
          
            "clause_processing": 
              "clause": "ORDER BY",
              "original_clause": "`employees`.`position`",
              "items": [
                
                  "item": "`employees`.`position`"
                
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`employees`.`position`"
             /* clause_processing */
          ,
          
            "refine_plan": [
              
                "table": "`employees`",
                "access_type": "table_scan"
              
            ] /* refine_plan */
          ,
          
            "reconsidering_access_paths_for_index_ordering": 
              "clause": "ORDER BY",
              "index_order_summary": 
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "unknown",
                "plan_changed": false
               /* index_order_summary */
             /* reconsidering_access_paths_for_index_ordering */
          
        ] /* steps */
       /* join_optimization */
    ,
    
      "join_execution":   --第三阶段:SQL执行阶段
        "select#": 1,
        "steps": [
          
            "filesort_information": [
              
                "direction": "asc",
                "table": "`employees`",
                "field": "position"
              
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": 
              "usable": false,
              "cause": "not applicable (no LIMIT)"
             /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": 
              "rows": 3,
              "examined_rows": 3,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 200704,
              "sort_mode": "<sort_key, additional_fields>"
             /* filesort_summary */
          
        ] /* steps */
       /* join_execution */
    
  ] /* steps */

全表扫描的成本低于索引扫描, 索引MySQL最终会选择全表扫描。

SELECT * FROM employees WHERE name > 'zzz' ORDER BY position;
SELECT * FROM information_schema.OPTIMIZER_TRACE;


  "steps": [
    
      "join_preparation": 
        "select#": 1,
        "steps": [
          
            "expanded_query": "/* select#1 */ select `employees`.`id` AS `id`,`employees`.`name` AS `name`,`employees`.`age` AS `age`,`employees`.`position` AS `position`,`employees`.`hire_time` AS `hire_time` from `employees` where (`employees`.`name` > 'zzz') order by `employees`.`position`"
          
        ] /* steps */
       /* join_preparation */
    ,
    
      "join_optimization": 
        "select#": 1,
        "steps": [
          
            "condition_processing": 
              "condition": "WHERE",
              "original_condition": "(`employees`.`name` > 'zzz')",
              "steps": [
                
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`employees`.`name` > 'zzz')"
                ,
                
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`employees`.`name` > 'zzz')"
                ,
                
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`employees`.`name` > 'zzz')"
                
              ] /* steps */
             /* condition_processing */
          ,
          
            "table_dependencies": [
              
                "table": "`employees`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ] /* depends_on_map_bits */
              
            ] /* table_dependencies */
          ,
          
            "ref_optimizer_key_uses": [
            ] /* ref_optimizer_key_uses */
          ,
          
            "rows_estimation": [
              
                "table": "`employees`",
                "range_analysis": 
                  "table_scan": 
                    "rows": 3,
                    "cost": 3.7
                   /* table_scan */,
                  "potential_range_indices": [
                    
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    ,
                    
                      "index": "idx_name_age_position",
                      "usable": true,
                      "key_parts": [
                        "name",
                        "age",
                        "position",
                        "id"
                      ] /* key_parts */
                    ,
                    
                      "index": "idx_age",
                      "usable": false,
                      "cause": "not_applicable"
                    
                  ] /* potential_range_indices */,
                  "setup_range_conditions": [
                  ] /* setup_range_conditions */,
                  "group_index_range": 
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                   /* group_index_range */,
                  "analyzing_range_alternatives": 
                    "range_scan_alternatives": [
                      
                        "index": "idx_name_age_position",
                        "ranges": [
                          "zzz < name"
                        ] /* ranges */,
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 1,
                        "cost": 2.21,
                        "chosen": true
                      
                    ] /* range_scan_alternatives */,
                    "analyzing_roworder_intersect": 
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                     /* analyzing_roworder_intersect */
                   /* analyzing_range_alternatives */,
                  "chosen_range_access_summary": 
                    "range_access_plan": 
                      "type": "range_scan",
                      "index": "idx_name_age_position",
                      "rows": 1,
                      "ranges": [
                        "zzz < name"
                      ] /* ranges */
                     /* range_access_plan */,
                    "rows_for_plan": 1,
                    "cost_for_plan": 2.21,
                    "chosen": true
                   /* chosen_range_access_summary */
                 /* range_analysis */
              
            ] /* rows_estimation */
          ,
          
            "considered_execution_plans": [
              
                "plan_prefix": [
                ] /* plan_prefix */,
                "table": "`employees`",
                "best_access_path": 
                  "considered_access_paths": [
                    
                      "access_type": "range",
                      "rows": 1,
                      "cost": 2.41,
                      "chosen": true,
                      "use_tmp_table": true
                    
                  ] /* considered_access_paths */
                 /* best_access_path */,
                "cost_for_plan": 2.41,
                "rows_for_plan": 1,
                "sort_cost": 1,
                "new_cost_for_plan": 3.41,
                "chosen": true
              
            ] /* considered_execution_plans */
          ,
          
            "attaching_conditions_to_tables": 
              "original_condition": "(`employees`.`name` > 'zzz')",
              "attached_conditions_computation": [
              ] /* attached_conditions_computation */,
              "attached_conditions_summary": [
                
                  "table": "`employees`",
                  "attached": "(`employees`.`name` > 'zzz')"
                
              ] /* attached_conditions_summary */
             /* attaching_conditions_to_tables */
          ,
          
            "clause_processing": 
              "clause": "ORDER BY",
              "original_clause": "`employees`.`position`",
              "items": [
                
                  "item": "`employees`.`position`"
                
              ] /* items */,
              "resulting_clause_is_simple": true,
              "resulting_clause": "`employees`.`position`"
             /* clause_processing */
          ,
          
            "refine_plan": [
              
                "table": "`employees`",
                "pushed_index_condition": "(`employees`.`name` > 'zzz')",
                "table_condition_attached": null,
                "access_type": "range"
              
            ] /* refine_plan */
          ,
          
            "reconsidering_access_paths_for_index_ordering": 
              "clause": "ORDER BY",
              "index_order_summary": 
                "table": "`employees`",
                "index_provides_order": false,
                "order_direction": "undefined",
                "index": "idx_name_age_position",
                "plan_changed": false
               /* index_order_summary */
             /* reconsidering_access_paths_for_index_ordering */
          
        ] /* steps */
       /* join_optimization */
    ,
    
      "join_execution": 
        "select#": 1,
        "steps": [
          
            "filesort_information": [
              
                "direction": "asc",
                "table": "`employees`",
                "field": "position"
              
            ] /* filesort_information */,
            "filesort_priority_queue_optimization": 
              "usable": false,
              "cause": "not applicable (no LIMIT)"
             /* filesort_priority_queue_optimization */,
            "filesort_execution": [
            ] /* filesort_execution */,
            "filesort_summary": 
              "rows": 0,
              "examined_rows": 0,
              "number_of_tmp_files": 0,
              "sort_buffer_size": 200704,
              "sort_mode": "<sort_key, additional_fields>"
             /* filesort_summary */
          
        ] /* steps */
       /* join_execution */
    
  ] /* steps */

查看trace字段可知索引扫描的成本低于全表扫描的成本,所以MySQL最终选择索引扫描。

SET SESSION optimizer_trace="enabled=off"; -- 关闭trace

还没关注我的公众号?

  • 扫文末二维码关注公众号【小强的进阶之路】可领取如下:
  • 学习资料: 1T视频教程:涵盖Javaweb前后端教学视频、机器学习/人工智能教学视频、Linux系统教程视频、雅思考试视频教程;
  • 100多本书:包含C/C++、Java、Python三门编程语言的经典必看图书、LeetCode题解大全;
  • 软件工具:几乎包括你在编程道路上的可能会用到的大部分软件;
  • 项目源码:20个JavaWeb项目源码。
    技术图片

以上是关于五、如何建立合适的索引的主要内容,如果未能解决你的问题,请参考以下文章

0929mysql前缀索引如何找到合适的位数

mysql设置合适的索引长度

第五节 索引

mysql innodb建立普通索引怎么写

SQL索引建立遵守六大铁律

数据库系统概论-[04]索引