MySQL优化：order by和limit

Posted 2023-03-18

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了MySQL优化：order by和limit相关的知识，希望对你有一定的参考价值。

参考技术A 1. 对order by使用复合索引

order by和limit一起使用，避免引起全表扫描和数据排序是非常重要的，因此借助合适的索引提高查询效率。

使用联合索引

联合索引又叫复合索引，是由表中的几个列联合组成的索引。联合索引生效需满足最左前缀原则，即如果联合索引列为a,b,c三列，a,b,c 、a,b 、a生效，b,c、a,c、b、c等不生效（此处的顺序不是where条件后面的先后顺序，而是where条件中是否存在这些列，如果where中只存在a,c列，则不生效）。

索引生效，与where条件的顺序无关：

索引失效，与where条件的列是否存在有关：

带IN条件的联合索引失效

in的参数个数为1个，联合索引生效，大于1个，索引失效。所以使用了强制索引使联合索引生效。

原因分析：

第一、取决于B树的数据结构，单参数的IN只会得到一颗基于model子树，该子树的code本身是有序的，所以索引生效，查询效率高；多参数的IN会得到多颗基于model的子树，每颗子树的code字段是有序的，但是总体上可能不是有序的，所以索引失效，查询效率低。

第二、使用强制索引后，理论上无法保证order by的顺序，但是如果数据本身的特性，比如时间递增的这类数据，总体上还是有序的，笔者试过多中途径想要迫使强制索引得到错误的结果，结果都对了。强制索引需进一步研究。

2. 大数据量limit慎用

limit常用于分页中，有两种用法，三种写法：

偏移量offset较大的优化

limit偏移量较小时性能优秀，分页越到后面，偏移量递增，limit的性能会逐渐下降。

此时，通过子查询优化limit，效果如下：

以上数据来自一张超过2000万的mysql单表，仅供参考，能够说明子查询明显能够提升效率，笔者开始尝试把子查询的order by去掉，发现查询效率又提升2倍，但是对比发现数据不正确，explain后发现查询优化器给出的子查询索引并不是id（此表建有多个索引，id是主键，区分度最高），这一点比较困惑。

ps：在sql语句中，limt关键字是最后才用到的。以下条件的出现顺序一般是：where->group by->having-order by->limit

mysql中查询第几行到第几行的记录

1、查询前n行

查询第一行

2、查询第n行到第m行

查询第4行到第6行

3、查询后n行

　　查询最后一行

4、查询一条记录的下一条记录

查询一条记录的上一条记录

MySQL实验内连接优化order by+limit 以及添加索引再次改进

在进行子查询优化双参数limit时我萌生了测试更加符合实际生产需要的ORDER BY + LIMIT的想法，或许我们也可以对ORDER BY + LIMIT 也进行适当优化

实验准备

使用MySQL官方的大数据库employees进行实验，导入该示例库见此

准备使用其中的employees表，先查看一下表结构和表内的记录数量

mysql> desc employees;
+------------+---------------+------+-----+---------+-------+
| Field      | Type          | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no     | int(11)       | NO   | PRI | NULL    |       |
| birth_date | date          | NO   |     | NULL    |       |
| first_name | varchar(14)   | NO   |     | NULL    |       |
| last_name  | varchar(16)   | NO   |     | NULL    |       |
| gender     | enum(\'M\',\'F\') | NO   |     | NULL    |       |
| hire_date  | date          | NO   |     | NULL    |       |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)

mysql> select count(*) from employeed;
ERROR 1146 (42S02): Table \'employees.employeed\' doesn\'t exist
mysql> select count(*) from employees;
+----------+
| count(*) |
+----------+
|   300024 |
+----------+
1 row in set (0.05 sec)

我们可以看到，只有主键emp_no有索引

实验过程

MySQL5.7官网对Explain各项参数的解释

官网对ORDER BY机制的详解

explain参数5.7版本推荐参考博客

老版本explain推荐参考博客（即新版本默认explain extended）

关于explain参数的拓展链接

MySQL explain key值的解释

使用未优化order by + limit

mysql> select * from employees order by birth_date limit 200000,10;
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name  | gender | hire_date  |
+--------+------------+------------+------------+--------+------------+
| 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |
| 494212 | 1960-09-25 | Susuma     | Baranowski | M      | 1989-05-15 |
| 496888 | 1960-09-25 | Rosalyn    | Rebaine    | M      | 1985-11-27 |
| 497766 | 1960-09-25 | Matt       | Atrawala   | F      | 1987-02-11 |
| 481404 | 1960-09-25 | Sanjeeva   | Eterovic   | F      | 1986-06-05 |
| 483269 | 1960-09-25 | Mitchel    | Pramanik   | F      | 1997-07-23 |
| 483270 | 1960-09-25 | Geoff      | Gulik      | F      | 1993-11-25 |
|  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |
| 101264 | 1960-09-25 | Mansur     | Atchley    | F      | 1990-05-22 |
|  92453 | 1960-09-25 | Khalid     | Trystram   | M      | 1993-11-10 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.20 sec)

mysql> explain select * from employees order by birth_date limit 200000,10;
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra          |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
|  1 | SIMPLE      | employees | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 299468 |   100.00 | Using filesort |
+----+-------------+-----------+------------+------+---------------+------+---------+------+--------+----------+----------------+
1 row in set, 1 warning (0.00 sec)

我们可以看到，未优化时使用的是全表扫描，花费0.2s

内连接优化

优化思路：我们可以利用主键emp_no的索引树，在索引树上将符合order by birth_date limit 200000,10的元组（即，行）的主键找出来，再用内连接返回10行emp_no的所有信息。

（内连接只返回表中与连接条件相匹配的行，也就是说，select emp_no from employees order by birth_date limit 200000,10只会返回10个emp_no，那么内连接后，结果集中也只有10个emp_no对应的所有信息）

（另外这里的内连接时使用了emp_no，即，子查询中也有"覆盖索引"减少磁盘I/O的功劳）

mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);
+--------+------------+------------+-----------+--------+------------+
| emp_no | birth_date | first_name | last_name | gender | hire_date  |
+--------+------------+------------+-----------+--------+------------+
| 427365 | 1960-09-24 | Yuping     | Sethi     | M      | 1990-06-21 |
| 424219 | 1960-09-25 | Woody      | Bernini   | M      | 1989-03-10 |
| 469218 | 1960-09-25 | George     | Plotkin   | M      | 1992-02-19 |
| 404121 | 1960-09-25 | Domenico   | Birnbaum  | M      | 1993-08-01 |
| 404266 | 1960-09-25 | Quingbo    | Jervis    | F      | 1985-03-15 |
| 409133 | 1960-09-25 | Nitsan     | Kleiser   | F      | 1985-05-18 |
| 409558 | 1960-09-25 | Shunichi   | Hofting   | F      | 1992-07-06 |
| 412045 | 1960-09-25 | Kristin    | Bolotov   | F      | 1985-06-28 |
| 481404 | 1960-09-25 | Sanjeeva   | Eterovic  | F      | 1986-06-05 |
| 483269 | 1960-09-25 | Mitchel    | Pramanik  | F      | 1997-07-23 |
+--------+------------+------------+-----------+--------+------------+
10 rows in set (0.10 sec)

mysql> explain select * from employees inner join (select emp_no from employees order by birth_date limit 100000,10) as table_temp using (emp_no);
+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
| id | select_type | table      | partitions | type   | possible_keys | key     | key_len | ref               | rows   | filtered | Extra          |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
|  1 | PRIMARY     | <derived2> | NULL       | ALL    | NULL          | NULL    | NULL    | NULL              | 100010 |   100.00 | NULL           |
|  1 | PRIMARY     | employees  | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | table_temp.emp_no |      1 |   100.00 | NULL           |
|  2 | DERIVED     | employees  | NULL       | ALL    | NULL          | NULL    | NULL    | NULL              | 299468 |   100.00 | Using filesort |
+----+-------------+------------+------------+--------+---------------+---------+---------+-------------------+--------+----------+----------------+
3 rows in set, 1 warning (0.00 sec)

可见效率提高了一倍，在explain中

第三行的select_type为DERIVED，是指这行是包含在from子句中的查询，我们可以看到，子句查询也没有使用索引
<derived2>是指，第一行的查询说明表示当前查询依赖 id=N 的查询，此处N=2，那我们先看第二行：

第二行type为eq_ref是指primary key 或 unique key 索引被连接（join）使用，，对于每个索引键的关联查询，返回匹配唯一行数据（有且只有1个）。在这里就是说在子查询查询到emp_no后，子查询中产生的临时表与employees表进行连接。
（对于这里的explain的解释只包含了对explain各项参数的解释，但似乎没有办法直接验证优化思路，还望各位看官前辈指点）

为排序字段加上索引

既然我们在内连接中是通过排序字段birth_date后对emp_no进行查询，那么我们或许能再为排序字段加上索引以再次提高效率。

mysql> alter table employees add index birthdate_index (birth_date);
Query OK, 0 rows affected (0.75 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> desc employees;
+------------+---------------+------+-----+---------+-------+
| Field      | Type          | Null | Key | Default | Extra |
+------------+---------------+------+-----+---------+-------+
| emp_no     | int(11)       | NO   | PRI | NULL    |       |
| birth_date | date          | NO   | MUL | NULL    |       |
| first_name | varchar(14)   | NO   |     | NULL    |       |
| last_name  | varchar(16)   | NO   |     | NULL    |       |
| gender     | enum(\'M\',\'F\') | NO   |     | NULL    |       |
| hire_date  | date          | NO   |     | NULL    |       |
+------------+---------------+------+-----+---------+-------+
6 rows in set (0.00 sec)

然后我们再次执行未优化和通过内连接优化的两条查询语句。

mysql> select * from employees order by birth_date limit 200000,10;
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name  | gender | hire_date  |
+--------+------------+------------+------------+--------+------------+
| 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |
| 494212 | 1960-09-25 | Susuma     | Baranowski | M      | 1989-05-15 |
| 496888 | 1960-09-25 | Rosalyn    | Rebaine    | M      | 1985-11-27 |
| 497766 | 1960-09-25 | Matt       | Atrawala   | F      | 1987-02-11 |
| 481404 | 1960-09-25 | Sanjeeva   | Eterovic   | F      | 1986-06-05 |
| 483269 | 1960-09-25 | Mitchel    | Pramanik   | F      | 1997-07-23 |
| 483270 | 1960-09-25 | Geoff      | Gulik      | F      | 1993-11-25 |
|  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |
| 101264 | 1960-09-25 | Mansur     | Atchley    | F      | 1990-05-22 |
|  92453 | 1960-09-25 | Khalid     | Trystram   | M      | 1993-11-10 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.20 sec)

mysql> select * from employees inner join (select emp_no from employees order by birth_date limit 200000,10) as temp_table using (emp_no);
+--------+------------+------------+------------+--------+------------+
| emp_no | birth_date | first_name | last_name  | gender | hire_date  |
+--------+------------+------------+------------+--------+------------+
| 498507 | 1960-09-24 | Perla      | Delgrange  | M      | 1989-12-08 |
|  23102 | 1960-09-25 | Hsiangchu  | Harbusch   | M      | 1986-03-14 |
|  29961 | 1960-09-25 | Susumu     | Munoz      | F      | 1989-12-31 |
|  32061 | 1960-09-25 | Dipankar   | Buescher   | M      | 1992-10-24 |
|  36216 | 1960-09-25 | Xianlong   | Rassart    | F      | 1987-09-05 |
|  37058 | 1960-09-25 | Khue       | Osgood     | M      | 1991-11-04 |
|  38365 | 1960-09-25 | Sariel     | Ramsak     | M      | 1993-02-26 |
|  39901 | 1960-09-25 | Jianhui    | Ushiama    | M      | 1985-12-03 |
|  59683 | 1960-09-25 | Supot      | Millington | F      | 1991-06-03 |
|  63784 | 1960-09-25 | Rosita     | Zyda       | M      | 1988-08-12 |
+--------+------------+------------+------------+--------+------------+
10 rows in set (0.03 sec)

我们可以看到，普通查询语句并没有得到效率上的提升，但是内连接的查询效率得到了很大的提升，花费时间从原来的0.1s缩减为0.03秒，也就是说，再次优化后的内连接差不多可以应对百万（甚至千万级，因为实际生产中所使用的硬件设施肯定会远远好与我现在的基础班ECS）级别的数据了。

对于加上 birthdate_index索引后普通查询效率未提升的说明：

因为我们查询的是select *，即使emp_no和birth_date上有索引，在查询其他列信息的时候，我们依然需要回表。因此即使加上索引后，我们的普通查询依然使用的是全表扫描。