MySQL不走索引的原因

Posted 2021-11-06 章怀柔

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了MySQL不走索引的原因相关的知识，希望对你有一定的参考价值。

1、基本结论

SQL 的执行成本（cost）是 mysql 优化器选择 SQL 执行计划时一个重要考量因素。当优化器认为使用索引的成本高于全表扫描的时候，优化器将会选择全表扫描，而不是使用索引。

下面通过一个实验来说明。

2、问题现象

如下结构的一张表，表中约有104w行数据：

CREATE TABLE `test03` (
  `id` int(11) NOT NULL AUTO_INCREMENT COMMENT \'自增主键\',
  `dept` tinyint(4) NOT NULL COMMENT \'部门id\',
  `name` varchar(30) COLLATE utf8mb4_bin DEFAULT NULL COMMENT \'用户名称\',
  `create_time` datetime NOT NULL COMMENT \'注册时间\',
  `last_login_time` datetime DEFAULT NULL COMMENT \'最后登录时间\',
  PRIMARY KEY (`id`),
  KEY `ct_index` (`create_time`)
) ENGINE=InnoDB AUTO_INCREMENT=1048577 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT=\'测试表\'

查询1，并未用到 ct_index(create_time) 索引:

type 为 ALL ，而不是 range
rows 行数和全表行数接近

# 查询1
mysql> explain select * from test03 where create_time > \'2021-10-01 02:04:36\';
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
|  1 | SIMPLE      | test03 | NULL       | ALL  | ct_index      | NULL | NULL    | NULL | 1045955 |    50.00 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+---------+----------+-------------+
1 row in set, 1 warning (0.00 sec)

而查询2，则用到了 ct_index(create_time) 索引:

# 查询2
mysql> explain select * from test03 where create_time < \'2021-01-01 02:04:36\';
+----+-------------+--------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
| id | select_type | table  | partitions | type  | possible_keys | key      | key_len | ref  | rows | filtered | Extra                 |
+----+-------------+--------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+
|  1 | SIMPLE      | test03 | NULL       | range | ct_index      | ct_index | 5       | NULL |  169 |   100.00 | Using index condition |
+----+-------------+--------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+

3、获得SQL优化器处理信息

这里使用 optimizer trace 工具，观察 MySQL 对 SQL 的优化处理过程：

# 调大trace的容量，防止被截断
set global optimizer_trace_max_mem_size = 1048576;

# 开启optimizer_trace
set optimizer_trace="enabled=on";

# 执行SQL
select * from test03 where create_time > \'2021-10-01 02:04:36\';

# SQL执行完成之后，查看TRACE
 select TRACE from INFORMATION_SCHEMA.OPTIMIZER_TRACE\\G

获得关于此 SQL 的详细优化器处理信息：

mysql> select TRACE from INFORMATION_SCHEMA.OPTIMIZER_TRACE\\G                                                                                       
*************************** 1. row ***************************
TRACE: {
  "steps": [
    {
      "join_preparation": {
        "select#": 1,
        "steps": [
          {
            "expanded_query": "/* select#1 */ select `test03`.`id` AS `id`,`test03`.`dept` AS `dept`,`test03`.`name` AS `name`,`test03`.`create_time` AS `create_time`,`test03`.`last_login_time` AS `last_login_time` from `test03` where (`test03`.`create_time` > \'2021-10-01 02:04:36\')"
          }
        ]
      }
    },
    {
      "join_optimization": {
        "select#": 1,
        "steps": [
          {
            "condition_processing": {
              "condition": "WHERE",
              "original_condition": "(`test03`.`create_time` > \'2021-10-01 02:04:36\')",
              "steps": [
                {
                  "transformation": "equality_propagation",
                  "resulting_condition": "(`test03`.`create_time` > \'2021-10-01 02:04:36\')"
                },
                {
                  "transformation": "constant_propagation",
                  "resulting_condition": "(`test03`.`create_time` > \'2021-10-01 02:04:36\')"
                },
                {
                  "transformation": "trivial_condition_removal",
                  "resulting_condition": "(`test03`.`create_time` > \'2021-10-01 02:04:36\')"
                }
              ]
            }
          },
          {
            "substitute_generated_columns": {
            }
          },
          {
            "table_dependencies": [
              {
                "table": "`test03`",
                "row_may_be_null": false,
                "map_bit": 0,
                "depends_on_map_bits": [
                ]
              }
            ]
          },
          {
            "ref_optimizer_key_uses": [
            ]
          },
          {
            "rows_estimation": [
              {
                "table": "`test03`",
                "range_analysis": {
                  "table_scan": {
                    "rows": 1045955,
                    "cost": 212430
                  },
                  "potential_range_indexes": [
                    {
                      "index": "PRIMARY",
                      "usable": false,
                      "cause": "not_applicable"
                    },
                    {
                      "index": "ct_index",
                      "usable": true,
                      "key_parts": [
                        "create_time",
                        "id"
                      ]
                    }
                  ],
                  "setup_range_conditions": [
                  ],
                  "group_index_range": {
                    "chosen": false,
                    "cause": "not_group_by_or_distinct"
                  },
                  "analyzing_range_alternatives": {
                    "range_scan_alternatives": [
                      {
                        "index": "ct_index",
                        "ranges": [
                          "0x99aac22124 < create_time"
                        ],
                        "index_dives_for_eq_ranges": true,
                        "rowid_ordered": false,
                        "using_mrr": false,
                        "index_only": false,
                        "rows": 522977,
                        "cost": 627573,
                        "chosen": false,
                        "cause": "cost"
                      }
                    ],
                    "analyzing_roworder_intersect": {
                      "usable": false,
                      "cause": "too_few_roworder_scans"
                    }
                  }
                }
              }
            ]
          },
          {
            "considered_execution_plans": [
              {
                "plan_prefix": [
                ],
                "table": "`test03`",
                "best_access_path": {
                  "considered_access_paths": [
                    {
                      "rows_to_scan": 1045955,
                      "access_type": "scan",
                      "resulting_rows": 1.05e6,
                      "cost": 212428,
                      "chosen": true
                    }
                  ]
                },
                "condition_filtering_pct": 100,
                "rows_for_plan": 1.05e6,
                "cost_for_plan": 212428,
                "chosen": true
              }
            ]
          },
          {
            "attaching_conditions_to_tables": {
              "original_condition": "(`test03`.`create_time` > \'2021-10-01 02:04:36\')",
              "attached_conditions_computation": [
              ],
              "attached_conditions_summary": [
                {
                  "table": "`test03`",
                  "attached": "(`test03`.`create_time` > \'2021-10-01 02:04:36\')"
                }
              ]
            }
          },
          {
            "refine_plan": [
              {
                "table": "`test03`"
              }
            ]
          }
        ]
      }
    },
    {
      "join_execution": {
        "select#": 1,
        "steps": [
        ]
      }
    }
  ]
}
1 row in set (0.00 sec)

通过逐行阅读，发现优化器在 join_optimization（SQL优化阶段）部分的 rows_estimation内容里：

明确指出了使用索引 ct_index(create_time) 和全表扫描的成本差异
同时指出了未选择索引的原因：cost

4、为什么使用索引的成本比全表扫描还高？

通过观察优化器的信息，不难发现，使用索引扫描行数约52w行，而全表扫描约为104w行。为什么优化器反而认为使用索引的成本比全表扫描还高呢？

因为当 ct_index(create_time) 这个普通索引并不包括查询的所有列，因此需要通过 ct_index 的索引树找到对应的主键 id ，然后再到 id 的索引树进行数据查询，即回表（通过索引查出主键，再去查数据行），这样成本必然上升。尤其是当回表的数据量比较大的时候，经常会出现 MySQL 优化器认为回表查询代价过高而不选择索引的情况。

这里可以回头看查询1和查询2的数据量占比：

查询1的数据量占整个表的60%，回表成本高，因此优化器选择了全表扫描
查询2的数据量占整个表的0.02%，因此优化器选择了索引

mysql> select (select count(*) from test03 where create_time > \'2021-10-01 02:04:36\')/(select count(*) from test03) as \'>20211001\', (select count(*) from test03 where create_time < \'2021-01-01 02:04:36\')/(select count(*) from test03) as \'<20210101\';
+-----------+-----------+
| >20211001 | <20210101 |
+-----------+-----------+
|    0.5997 |    0.0002 |
+-----------+-----------+
1 row in set (0.44 sec)

另外，在 MySQL 的官方文档中对此也有简要的描述：

当优化器认为全表扫描成本更低的时候，就不会使用索引
并没有一个固定的数据量占比来决定优化器是否使用全表扫描（曾经是30%）
优化器在选择的时候会考虑更多的因素，如：表大小，行数量，IO块大小等

https://dev.mysql.com/doc/refman/5.7/en/where-optimization.html

转载自爱可生技术文档

螃蟹在剥我的壳，笔记本在写我，漫天的我落在枫叶上雪花上，而你在想我。 --章怀柔

记一次Mysql不走日期字段索引的原因

背景

在一个表中，dataTime字段设置是varchar类型，存入的数据是日期格式的数据，并且为该字段设置了索引。但是在日志记录中，有一条关于该表的慢查询。查询语句为:
select * from digitaltwin_meteorological where dataTime > '2021-10-15';
explain分析sql语句，发现sql语句执行了全表扫描。为何sql中用了dataTime索引列，为啥还走全表扫描呢？

探索

一:
起初，认为是dataTime字段类型为varchar，所以mysql在索引排序时，按照字符串顺序进行排序了，而不是日期大小顺序进行排序的，所以在范围查询时，并不能按照日期顺序进行索引的范围分区。于是把dataTime改为datatime类型，在分析语句，发现还是全表扫描。
二:
改变查询条件的值，

 select count(*) from digitaltwin_meteorological where dataTime > '2021-10-15';

执行结果为3910。

EXPLAIN select * from digitaltwin_meteorological where dataTime > '2021-10-15';

sql语句分析结果为全表扫描:

我们把查询条件改为16号，看有多少条数据:

 select count(*) from digitaltwin_meteorological where dataTime > '2021-10-16';

查询结果为2525，下面我们分析16号的查询语句:

EXPLAIN select * from digitaltwin_meteorological where dataTime > '2021-10-16';

执行结果为range查询，利用到了索引:

由此可见，当查询出来的记录条数多时，mysql会走全表扫描，认为全表扫描的效率更快。当查询出来的记录少时，mysql会使用索引查询。
全表的数据量为19714条数据，也就是说当2525/19714=13%的时候，mysql走索引查询。当3910/19714=20%的时候，mysql走全表扫描。

三:
我们把dataTime该为了datetime数据类型，那么查询条件是否还需要加引号呢，我们把dataTime查询条件的引号去掉，看结果:

EXPLAIN select * from digitaltwin_meteorological where dataTime > 2021-10-16;

可见，去掉引号后，又成了全表扫描。所以说，不管字段类型是varchar还是datetime，查询条件的值都需要加引号。而不加引号，mysql会把这个值做一些运算操作，其实不加引号后2021-10-16就不再是16号的日期了，我们看如下sql：

 select count(*) from digitaltwin_meteorological where dataTime > 2021-10-16;

计算结果为19714，全表的数据，所以说，datetime查询条件也需要加引号。

四:
如上的分析，都是dataTime在datetime类型情况下的讨论。而最初的字段类型是varchar，那么改成varchar类型，如上的结论还存在吗，我们修改类型，再执行sql:

EXPLAIN select * from digitaltwin_meteorological where dataTime > '2021-10-16';

可以看到，改成varchar类型后，16号查询成了全表扫描，而不是range扫描。
把条件改成17号，看执行结果:

EXPLAIN select * from digitaltwin_meteorological where dataTime > '2021-10-17';

17号的查询走了索引查询。我们看17号的数据量是1749。
所以，在字段类型为varchar时，1749/19714=9%的情况下，会走索引，而2525/19714=13%的情况下，会全表扫描。
也就是说当是datetime类型时，查询结果占13%的情况下，会走索引查询，而当是varchar类型时，查询结果占全表数据的13%时，会走全表扫描。这也是为什么日期类型我们要设置为datetime而不是varchar的原因之一。

总结

通过上述分析，可以总结如下结论:
1.范围查询中，当查询的数据量达到一定范围后，mysql认为全表扫描效率更高，会走全表扫描，而非索引。
2.datetime字段类型的值在查询时也要加引号，否则mysql不会按日期进行处理。
3.日期格式的数据，设置为varchar类型，范围查询走索引还是全表扫描的临界值比datetime类型的查询走索引查询还是全表扫描的临界值低，所以日期类型数据设置为datetime类型，会有更高概率走索引查询。

以上是关于MySQL不走索引的原因的主要内容，如果未能解决你的问题，请参考以下文章