order by limit 造成优化器选择索引错误

Posted 2022-06-22 渔夫数据库笔记

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了order by limit 造成优化器选择索引错误相关的知识，希望对你有一定的参考价值。

1.order by limit 选错索引示例

在日常工作中，经常发现一些简单的查询语句因为加了 order by limit 造成优化器选择索引错误。例如如下sql(此处就不造数据了，只是列出一个sql示例)

select * from test_table where name='xxxxx' order by id limit 1;

#id 列为表的主键，name列有索引。有可能该查询最高效的路径应该是使用 name 索引(name 列为前缀的组合索引)访问，但是优化器最后却选择通过主键id进行扫描。这样就可能导致查询很低效(当然查询也可能很高效，这个跟表中数据的具体分布有关)

2.order by limit 为什么会导致选错索引

在 optimize trace中我们看到有这样一块


            "reconsidering_access_paths_for_index_ordering": 
              "clause": "ORDER BY",
              "steps": [
              ],
              "index_order_summary": 
                "table": "`t_test_part12`",
                "index_provides_order": true,
                "order_direction": "desc",
                "index": "idx_create_time",
                "plan_changed": true,       ##在 reconsider之前选择的是条件列上的索引，只需要扫描部分数据，这里的true表示访问路径发生了改变(通过排序索引顺序扫描)
                "access_type": "index"
              
            
          

##这里的重的选择索引在源码中对应到sql/sql_select.cc中如下

test_if_cheaper_ordering(const JOIN_TAB *tab, ORDER *order, TABLE *table,
                         key_map usable_keys,  int ref_key,
                         ha_rows select_limit,
                         int *new_key, int *new_key_direction,
                         ha_rows *new_select_limit, uint *new_used_key_parts,
                         uint *saved_best_key_parts)
sql_select.cc 在5.7时就有4400多行，我们不需要刻意去撸通它，但是关于 test_if_cheaper_ordering 有一段注释值得我们去关注，这段注释如下：
 /*
           We assume that each of the tested indexes is not correlated
           with ref_key. Thus, to select first N records we have to scan
           N/selectivity(ref_key) index entries.
           selectivity(ref_key) = #scanned_records/#table_records =
           refkey_rows_estimate/table_records.
           In any case we can't select more than #table_records.
           N/(refkey_rows_estimate/table_records) > table_records
           <=> N > refkey_rows_estimate.
          */

##首先假设被检查的索引同ref_key没有关系。其次假设数据是均匀分布的。如果数据是均匀分布的,那么通过排序索引访问前N行数据(order by limit N)需要扫描的行数为 N*(table_records/distinct(ref_key))
这两个假设是估算通过排序索引来访问cost 的前提(但是现实生产中这两个假设在绝大多数场景中都是不成立的，所以就造成多数场景下索引选择错误)

3.问题处理

第二部分中我们知道，因为 test_if_cheaper_ordering() 假设的前提在生产环境中通常不会成立，所以就造成该类sql很多时候生成的都不是最优执行计划

##生产上遇到过通过条件索引过滤执行时间为几十毫秒，但是通过索引排序扫描耗时1小时的案例

3.1 处理方案1

主动关闭该优化器开关 prefer_ordering_index

## 该问题已经被确认为bug，但是mysql目前为止没有给出具体的优化方案，但是他们提供了一个优化器开关(mysql 8.0.21及之后版本)，让用户自己选择是否需要该类型的优化，如果用户认为该类优化会导致错误的执行计划，可以主动关闭该类优化。优化器开关名为 (默认打开，可以根据需要进行关闭)

3.2 处理方案2

重写 sql ，比如上面提到的 sql ，select * from test_table where name='xxxxx' order by id limit 1;

我们业务逻辑可以分两步处理

1)select min(id) from test_table where name='xxxxx'; ##取出符合条件的主键id

2)通过第一步中得到的主键id 去获取相关信息

select * from test_table where id=11;

3.3 处理方案3

通过 hint 固化执行计划。比如通过 force index 指定使用的索引

MySQL Bugs: #97001: Dangerous optimization reconsidering_access_paths_for_index_ordering

MySQL :: WL#13929: Introduce new optimizer switch to disable limit optimization

MySQL · 捉虫动态 · order by limit 造成优化器选择索引错误

mysql回表致索引失效_祈雨v的博客-CSDN博客_回表走索引吗

以上是关于order by limit 造成优化器选择索引错误的主要内容，如果未能解决你的问题，请参考以下文章