MySQL Execution Plan--DISTINCT语句优化

Posted 笑东风

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了MySQL Execution Plan--DISTINCT语句优化相关的知识,希望对你有一定的参考价值。

问题描述

在很多业务场景中业务需要过滤掉重复数据,对于MySQL数据库可以有多种SQL写法能实现这种需求,如:

  • 使用DISTINCT,如:

    SELECT DISTINCT username 
    FROM hotel_owner 
    WHERE username IN (\'user001\',\'user002\');
    
  • 使用GROUP BY,如:

    SELECT username 
    FROM hotel_owner 
    WHERE username IN (\'user001\',\'user002\')
    GROUP BY username;
    
  • 使用LIMIT 1,如:

    SELECT username 
    FROM(
        SELECT username
        FROM hotel_owner 
        WHERE username = \'user001\'
        LIMIT 1
    ) AS T1
    UNION ALL
    SELECT username 
    FROM(
        SELECT username
        FROM hotel_owner 
        WHERE username = \'user002\'
        LIMIT 1
    ) AS T2
    
  • 使用EXIST,如:

    SELECT username 
    FROM (
        SELECT \'user001\' AS username
        UNION ALL
        SELECT \'user002\' AS username
    ) AS T1
    WHERE EXISTS(
        SELECT username 
        FROM hotel_owner AS T2
        WHERE T1.username = T1.username
    )
    
  • 使用临时变量、使用公共表达式+rownumber(MYSQL 8.0)等其他

当前hotel_owner表上有索引idx_username(username),针对上面两个用户的数据量为:

mysql> SELECT username,count(1) AS usercount
    -> FROM hotel_owner
    -> WHERE username IN (\'user001\',\'user002\')
    -> GROUP BY username;
+-------------+-----------+
| username    | usercount |
+-------------+-----------+
| user002 |     16455 |
| user001 |     18718 |
+-------------+-----------+
2 rows in set (0.02 sec)

上面4种SQL都能得到相同的执行结果,但查询性能相差50倍以上。

问题原因

MySQL Server架构可分为MySQL网络连接层、MySQL服务层、MySQL存储引擎层三层:

  • MySQL网络连接层,负责处理客户端请求连接。
  • MySQL服务层,负责解析SQL语句生成直接计划,由查询执行引擎与存储引擎层进行交互处理,将处理结果返回给客户端。
  • MySQL存储引擎层,负责MySQL中数据的存储与提取,与底层系统文件进行交互。MySQL存储引擎是插件式的,服务器中的查询执行引擎通过接口与存储引擎进行通信,接口屏蔽了不同存储引擎之间的差异 。

MySQL查询处理流程如下:

由于MySQL架构的分层设计和不同存储引擎内部实现的差异性,MySQL服务层的查询优化器无法针对某个存储引擎进行定制开发,导致MySQL查询优化器在某些场景下无法生成"相对更优"的执行计划,需要研发人员"使用查询提示"或"改写SQL语句"来改变SQL语句的执行计划和提示SQL语句的执行效率。

通过MySQL内部工具profiling能清楚得到上面四种SQL语句的实际执行耗时:

*************************** 1. row ***************************
Query_ID: 1
Duration: 0.02456375
   Query: SELECT DISTINCT username
FROM hotel_owner
WHERE username IN (\'user001\',\'user002\')
*************************** 2. row ***************************
Query_ID: 2
Duration: 0.02770700
   Query: SELECT username
FROM hotel_owner
WHERE username IN (\'user001\',\'user002\')
GROUP BY username
*************************** 3. row ***************************
Query_ID: 3
Duration: 0.00054050
   Query: SELECT username
FROM(
    SELECT username
    FROM hotel_owner
    WHERE username = \'user001\'
    LIMIT 1
) AS T1
UNION ALL
SELECT username
FROM(
    SELECT username
    FROM hotel_owner
    WHERE username = \'user002\'
    LIMIT 1
) AS T2
*************************** 4. row ***************************
Query_ID: 4
Duration: 0.00083600
   Query: SELECT username
FROM (
    SELECT \'user001\' AS username
    UNION ALL
    SELECT \'user002\' AS username
) AS T1
WHERE EXISTS(
    SELECT username
    FROM hotel_owner AS T2
    WHERE T1.username = T1.username
)

DISTINCT方式和GROUP BY方式耗时接近,耗时分别为24ms和27ms。

LIMIT 1方式和EXISTS方式耗时接近,耗时分别为0.5毫秒和0.8ms。

其中DISTINCT方式的执行计划和执行成本明细为:

mysql> DESC SELECT DISTINCT username
    -> FROM hotel_owner
    -> WHERE username IN (\'user001\',\'user002\') \\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: hotel_owner
   partitions: NULL
         type: range
possible_keys: idx_seq_usr,idx_username
          key: idx_username
      key_len: 152
          ref: NULL
         rows: 66282
     filtered: 100.00
        Extra: Using where; Using index
1 row in set, 1 warning (0.00 sec)

mysql> SHOW PROFILE CPU,BLOCK IO,SWAPS FOR QUERY 1;
+----------------------+----------+----------+------------+--------------+---------------+-------+
| Status               | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | Swaps |
+----------------------+----------+----------+------------+--------------+---------------+-------+
| starting             | 0.000065 |     NULL |       NULL |         NULL |          NULL |  NULL |
| checking permissions | 0.000006 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Opening tables       | 0.000017 |     NULL |       NULL |         NULL |          NULL |  NULL |
| init                 | 0.000024 |     NULL |       NULL |         NULL |          NULL |  NULL |
| System lock          | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |
| optimizing           | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL |
| statistics           | 0.000158 |     NULL |       NULL |         NULL |          NULL |  NULL |
| preparing            | 0.000018 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sorting result       | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL |
| executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.024214 |     NULL |       NULL |         NULL |          NULL |  NULL |
| end                  | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL |
| query end            | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |
| closing tables       | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL |
| freeing items        | 0.000021 |     NULL |       NULL |         NULL |          NULL |  NULL |
| cleaning up          | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL |
+----------------------+----------+----------+------------+--------------+---------------+-------+
16 rows in set, 1 warning (0.00 sec)

LIMIT 1方式的执行成本明细为:

mysql> DESC SELECT username
    -> FROM(
    ->     SELECT username
    ->     FROM hotel_owner
    ->     WHERE username = \'user001\'
    ->     LIMIT 1
    -> ) AS T1
    -> UNION ALL
    -> SELECT username
    -> FROM(
    ->     SELECT username
    ->     FROM hotel_owner
    ->     WHERE username = \'user002\'
    ->     LIMIT 1
    -> ) AS T2 \\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
   partitions: NULL
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 2
  select_type: DERIVED
        table: hotel_owner
   partitions: NULL
         type: ref
possible_keys: idx_username
          key: idx_username
      key_len: 152
          ref: const
         rows: 34788
     filtered: 100.00
        Extra: Using index
*************************** 3. row ***************************
           id: 3
  select_type: UNION
        table: <derived4>
   partitions: NULL
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
     filtered: 100.00
        Extra: NULL
*************************** 4. row ***************************
           id: 4
  select_type: DERIVED
        table: hotel_owner
   partitions: NULL
         type: ref
possible_keys: idx_username
          key: idx_username
      key_len: 152
          ref: const
         rows: 31494
     filtered: 100.00
        Extra: Using index
4 rows in set, 1 warning (0.00 sec)

mysql> SHOW PROFILE CPU,BLOCK IO,SWAPS FOR QUERY 3;
+----------------------+----------+----------+------------+--------------+---------------+-------+
| Status               | Duration | CPU_user | CPU_system | Block_ops_in | Block_ops_out | Swaps |
+----------------------+----------+----------+------------+--------------+---------------+-------+
| starting             | 0.000099 |     NULL |       NULL |         NULL |          NULL |  NULL |
| checking permissions | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL |
| checking permissions | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Opening tables       | 0.000060 |     NULL |       NULL |         NULL |          NULL |  NULL |
| init                 | 0.000066 |     NULL |       NULL |         NULL |          NULL |  NULL |
| System lock          | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |
| optimizing           | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL |
| optimizing           | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL |
| statistics           | 0.000083 |     NULL |       NULL |         NULL |          NULL |  NULL |
| preparing            | 0.000018 |     NULL |       NULL |         NULL |          NULL |  NULL |
| executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.000027 |     NULL |       NULL |         NULL |          NULL |  NULL |
| statistics           | 0.000005 |     NULL |       NULL |         NULL |          NULL |  NULL |
| preparing            | 0.000005 |     NULL |       NULL |         NULL |          NULL |  NULL |
| optimizing           | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL |
| optimizing           | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL |
| statistics           | 0.000044 |     NULL |       NULL |         NULL |          NULL |  NULL |
| preparing            | 0.000006 |     NULL |       NULL |         NULL |          NULL |  NULL |
| executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.000020 |     NULL |       NULL |         NULL |          NULL |  NULL |
| statistics           | 0.000004 |     NULL |       NULL |         NULL |          NULL |  NULL |
| preparing            | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL |
| executing            | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |
| executing            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL |
| end                  | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL |
| query end            | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |
| removing tmp table   | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL |
| query end            | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| closing tables       | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| removing tmp table   | 0.000003 |     NULL |       NULL |         NULL |          NULL |  NULL |
| closing tables       | 0.000001 |     NULL |       NULL |         NULL |          NULL |  NULL |
| removing tmp table   | 0.000002 |     NULL |       NULL |         NULL |          NULL |  NULL |
| closing tables       | 0.000009 |     NULL |       NULL |         NULL |          NULL |  NULL |
| freeing items        | 0.000021 |     NULL |       NULL |         NULL |          NULL |  NULL |
| cleaning up          | 0.000008 |     NULL |       NULL |         NULL |          NULL |  NULL |
+----------------------+----------+----------+------------+--------------+---------------+-------+
37 rows in set, 1 warning (0.00 sec)

DISTINCT方式和LIMIT 1方式都使用索引,其中最大耗时差异在Sending data部分。

DISTINCT方式的Sending data部分耗时:

| Sending data         | 0.024214 |     NULL |       NULL |         NULL |          NULL |  NULL |

LIMIT 1方式的Sending data部分耗时:

| Sending data         | 0.000027 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.000020 |     NULL |       NULL |         NULL |          NULL |  NULL |
| Sending data         | 0.000007 |     NULL |       NULL |         NULL |          NULL |  NULL |

差异原因:

  • DISTINCT方式需要扫描所有满足WHERE条件的16455+18718条记录并将这些记录返回到MySQL Server层,由MySQL Server层负责数据去重处理(DISTINCT)并返回给客户端。
  • LIMIT 1方式针对每个子查询仅需要扫描到第1条满足WHERE条件的记录并将这些记录返回到MySQL Server层,由MySQL Server层负责数据合并(UNION ALL)并返回给客户端。

当满足WHERE条件的记录较少时,无论使用上述4种SQL种的任意1种方式都能快速返回结果,但随着满足WHERE条件的记录增多时,需要结合实际的业务需求和数据分布来编写"高效SQL"。

优化建议

在编写SQL语句时,不仅需要根据业务需求编写"正确SQL",还需要根据"实际数据分布"编写"高效SQL"。

MySQL Execution Plan--使用Query Rewrite

安装Query Rewrite Plugin

在MySQL的安装目录的share文件夹下,有两个文件用来安装和卸载Query Rewrite Plugin:

install_rewriter.sql: 安装脚本
uninstall_rewriter.sql: 卸载脚本
install_rewriter.sql文件中脚本为:
/* Copyright (c) 2015, Oracle and/or its affiliates. All rights reserved.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; version 2 of the License.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software Foundation,
   51 Franklin Street, Suite 500, Boston, MA 02110-1335 USA */

CREATE DATABASE IF NOT EXISTS query_rewrite;

CREATE TABLE IF NOT EXISTS query_rewrite.rewrite_rules (
  id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
  pattern VARCHAR(10000) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  pattern_database VARCHAR(20) CHARACTER SET utf8 COLLATE utf8_bin,
  replacement VARCHAR(10000) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL,
  enabled ENUM(YES, NO) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
    DEFAULT YES,
  message VARCHAR(1000) CHARACTER SET utf8 COLLATE utf8_bin,
  pattern_digest VARCHAR(32),
  normalized_pattern VARCHAR(100)
) DEFAULT CHARSET = utf8 ENGINE = INNODB;

INSTALL PLUGIN rewriter SONAME rewriter.so;
CREATE FUNCTION load_rewrite_rules RETURNS STRING
SONAME rewriter.so;

DELIMITER //

CREATE PROCEDURE query_rewrite.flush_rewrite_rules()
BEGIN
  DECLARE message_text VARCHAR(100);
  COMMIT;
  SELECT load_rewrite_rules() INTO message_text;
  RESET QUERY CACHE;
  IF NOT message_text IS NULL THEN
    SIGNAL SQLSTATE 45000 SET MESSAGE_TEXT = message_text;
  END IF;
END //

DELIMITER ;

RESET QUERY CACHE;

安装完成后,可以使用下面脚本查看功能是否启用:

SHOW GLOBAL VARIABLES LIKE rewriter_enabled;

+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| rewriter_enabled | ON    |
+------------------+-------+

 

演示Demo

1、插入重写规则

INSERT INTO query_rewrite.rewrite_rules
(pattern, replacement, pattern_database) 
VALUES 
(
"SELECT * FROM TB001 WHERE C1=?",
"SELECT *,NEW AS C11 FROM TB001 WHERE C1=?",
"demodb"
);

2、加载规则

## 加载重写规则
CALL query_rewrite.flush_rewrite_rules();


## 查看当前重写规则
SELECT * FROM query_rewrite.rewrite_rules G

*************************** 1. row ***************************
                id: 7
           pattern: SELECT * FROM TB001 WHERE C1=?
  pattern_database: demodb
       replacement: SELECT *,NEW AS C11 FROM TB001 WHERE C1=?
           enabled: YES
           message: NULL
    pattern_digest: cf177a9a728143a27502f890698316e5
normalized_pattern: select `*` from `demodb`.`tb001` where (`C1` = ?)
1 row in set (0.00 sec)

3、测试重写:

SELECT  *  FROM     TB001    
WHERE      C1=2;

+--------+------+-----+
| ID     | C1   | C11 |
+--------+------+-----+
| AAA102 |    2 | NEW |
| AAA112 |    2 | NEW |
| AAA12  |    2 | NEW |
| AAA122 |    2 | NEW |
| AAA132 |    2 | NEW |
+--------+------+-----+


SELECT  *  FROM     TB001    
WHERE      C1=2 LIMIT 1;

+--------+------+
| ID     | C1   |
+--------+------+
| AAA102 |    2 |
+--------+------+

可以发现SQL语句中的换行或空格对重写规则无影响,但不能在重写模板基础上增加其他内容如LIMIT 字句。

以上是关于MySQL Execution Plan--DISTINCT语句优化的主要内容,如果未能解决你的问题,请参考以下文章

MySQL Execution Plan--使用Query Rewrite

DataGrip/PhpStorm 未连接到 MySQL DB,错误:com.intellij.execution.ExecutionException

MySQL Execution Plan--COUNT相关测试

mysql官方文档之Optimization(8.8 Understanding the Query Execution Plan)

MySQL server has gone away和Maximum execution time of 120 seconds exceeded

Fatal error encountered during command execution