MySQL 选择查询在 where 和降序下都变得很慢

Posted 2023-04-15

技术标签:

【中文标题】MySQL 选择查询在 where 和降序下都变得很慢【英文标题】：MySQL select query gets quite slow with BOTH where and descending order 【发布时间】：2017-09-15 13:04:49 【问题描述】：

我有这个选择查询，ItemType 是 varchar 类型，ItemComments 是 int 类型：

select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1

你可以看到这个查询有3个条件：

其中“ItemType”等于特定值；按“ItemComments”排序按降序排列

有趣的是，当我选择具有所有三个条件的行时，它变得非常慢。但是，如果我放弃这三个中的任何一个（条件 2 除外），查询就会运行得非常快。见：

select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1;
/* Affected rows: 0  Found rows: 1  Warnings: 0  Duration for 1 query: 16.318 sec. */

select * from ItemInfo where ItemType="item_type" order by ItemComments limit 1;
/* Affected rows: 0  Found rows: 1  Warnings: 0  Duration for 1 query: 0.140 sec. */

select * from ItemInfo order by ItemComments desc limit 1;
/* Affected rows: 0  Found rows: 1  Warnings: 0  Duration for 1 query: 0.015 sec. */

另外，

mysql

我已经搜索了很多可能的解释，比如 MySQL 支持降序索引、复合索引等等。但是这些仍然无法解释为什么查询 #1 运行缓慢，而查询 #2 和 #3 运行良好。

如果有人可以帮助我，将不胜感激。

更新：创建表格并解释信息

创建代码：

CREATE TABLE `ItemInfo` (
`ItemID` VARCHAR(255) NOT NULL,
`ItemType` VARCHAR(255) NOT NULL,
`ItemPics` VARCHAR(255) NULL DEFAULT '0',
`ItemName` VARCHAR(255) NULL DEFAULT '0',
`ItemComments` INT(50) NULL DEFAULT '0',
`ItemScore` DECIMAL(10,1) NULL DEFAULT '0.0',
`ItemPrice` DECIMAL(20,2) NULL DEFAULT '0.00',
`ItemDate` DATETIME NULL DEFAULT '1971-01-01 00:00:00',
PRIMARY KEY (`ItemID`, `ItemType`),
INDEX `ItemDate` (`ItemDate`),
INDEX `ItemComments` (`ItemComments`),
INDEX `ItemType` (`ItemType`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB;

解释结果：

mysql> explain select * from ItemInfo where ItemType="item_type" order by ItemComments desc limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | i     | NULL       | index | ItemType      | ItemComments | 5       | NULL |   83 |     1.20 | Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+

mysql> explain select * from ItemInfo where ItemType="item_type" order by ItemComments limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra       |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | i     | NULL       | index | ItemType      | ItemComments | 5       | NULL |   83 |     1.20 | Using where |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------------+

mysql> explain select * from ItemInfo order by ItemComments desc limit 1;
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
| id | select_type | table | partitions | type  | possible_keys | key          | key_len | ref  | rows | filtered | Extra |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+
|  1 | SIMPLE      | i     | NULL       | index | NULL          | ItemComments | 5       | NULL |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+-------+---------------+--------------+---------+------+------+----------+-------+

来自 O. Jones 的询问：

mysql> explain
    ->  SELECT a.*
    ->      FROM ItemInfo a
    ->      JOIN (
    ->             SELECT MAX(ItemComments) ItemComments, ItemType
    ->               FROM ItemInfo
    ->              GROUP BY ItemType
    ->           ) maxcomm ON a.ItemType = maxcomm.ItemType
    ->                    AND a.ItemComments = maxcomm.ItemComments
    ->     WHERE a.ItemType = 'item_type';
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
| id | select_type | table      | partitions | type  | possible_keys                          | key         | key_len | ref                       | rows    | filtered | Extra                    |
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+
|  1 | PRIMARY     | a          | NULL       | ref   | ItemComments,ItemType                  | ItemType    | 767     | const                     |   27378 |   100.00 | Using where              |
|  1 | PRIMARY     | <derived2> | NULL       | ref   | <auto_key0>                            | <auto_key0> | 772     | mydb.a.ItemComments,const |      10 |   100.00 | Using where; Using index |
|  2 | DERIVED     | ItemInfo   | NULL       | index | PRIMARY,ItemDate,ItemComments,ItemType | ItemType    | 767     | NULL                      | 2289466 |   100.00 | NULL                     |
+----+-------------+------------+------------+-------+----------------------------------------+-------------+---------+---------------------------+---------+----------+--------------------------+

我不确定我是否正确执行了这个查询，但我在很长一段时间内都无法获取记录。

来自 Vijay 的查询。但是我添加了 ItemType 加入条件原因，只有 max_comnt 从其他 ItemType 返回项目：

SELECT ifo.* FROM ItemInfo ifo 
JOIN (SELECT ItemType, MAX(ItemComments) AS max_comnt FROM ItemInfo WHERE ItemType="item_type") inn_ifo 
ON ifo.ItemComments = inn_ifo.max_comnt and ifo.ItemType = inn_ifo.ItemType
/* Affected rows: 0  Found rows: 1  Warnings: 0  Duration for 1 query: 7.441 sec. */

explain result:
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
| id | select_type | table      | partitions | type        | possible_keys         | key                   | key_len | ref   | rows  | filtered | Extra                                               |
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+
|  1 | PRIMARY     | <derived2> | NULL       | system      | NULL                  | NULL                  | NULL    | NULL  |     1 |   100.00 | NULL                                                |
|  1 | PRIMARY     | ifo        | NULL       | index_merge | ItemComments,ItemType | ItemComments,ItemType | 5,767   | NULL  |    88 |   100.00 | Using intersect(ItemComments,ItemType); Using where |
|  2 | DERIVED     | ItemInfo   | NULL       | ref         | ItemType              | ItemType              | 767     | const | 27378 |   100.00 | NULL                                                |
+----+-------------+------------+------------+-------------+-----------------------+-----------------------+---------+-------+-------+----------+-----------------------------------------------------+

我想解释一下为什么我首先使用带限制的订单：我打算以特定的概率从表中随机获取记录。从 python 生成的随机索引并作为变量发送到 MySQL。但后来我发现它花费了很多时间，所以我决定只使用我得到的第一条记录。

在 O. Jones 和 Vijay 的启发下，我尝试使用 max 函数，但效果不佳：

select max(ItemComments) from ItemInfo where ItemType='item_type'
/* Affected rows: 0  Found rows: 1  Warnings: 0  Duration for 1 query: 6.225 sec. */

explain result:
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
| id | select_type | table      | partitions | type | possible_keys | key      | key_len | ref   | rows  | filtered | Extra |
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+
|  1 | SIMPLE      | ItemInfo   | NULL       | ref  | ItemType      | ItemType | 767     | const | 27378 |   100.00 | NULL  |
+----+-------------+------------+------------+------+---------------+----------+---------+-------+-------+----------+-------+

感谢所有对此问题的贡献。希望您能根据以上信息提出更多解决方案。

【问题讨论】：

你能解释一下吗？ 【参考方案1】：

请提供当前SHOW CREATE TABLE ItemInfo。

对于大多数查询，您需要复合索引

INDEX(ItemType, ItemComments)

对于最后一个，你需要

INDEX(ItemComments)

对于特别慢的查询，请提供EXPLAIN SELECT ...。

讨论 - 为什么INDEX(ItemType, ItemComments) 对where ItemType="item_type" order by ItemComments desc limit 1 有帮助？

索引以BTree（参见***）为结构，因此可以非常快速地搜索单个项目，并且可以非常快速地以特定顺序进行扫描。

where ItemType="item_type" 说要过滤ItemType，但索引中有很多这样的内容。在此索引中，它们按ItemComments 排序（对于给定的ItemType）。方向desc建议从ItemContents的最大值开始；那是索引项的“结束”。最后limit 1 说找到一个项目后停止。（有点像在你的通讯录中找到最后一个“S”。）

因此，查询是将 BTree“向下钻取”到组合 INDEX(ItemType, ItemContents) 中 ItemType 条目的末尾，然后抓取一个条目——这是一项非常有效的任务。

实际上SELECT * 意味着还有一个步骤，即获取该行的所有列。该信息不在索引中，而是在 ItemInfo 的 BTree 中——它包含所有行的所有列，按 PRIMARY KEY 排序。

“二级索引”(INDEX(ItemType, ItemComments)) 隐式包含相关PRIMARY KEY 列的副本，因此我们现在拥有ItemID 和ItemType 的值。有了这些，我们可以向下钻取另一个 BTree 以找到所需的行并获取所有 (*) 列。

【讨论】：

我添加了您提供的复合索引，并且确实有效！第一次查询执行变得非常快。但是你能解释一下它是如何工作的吗？阅读mysql.rjweb.org/doc.php/index_cookbook_mysql。它可能不会给你想要的解释，但它应该会给你更多的线索。 @BarryZhai - 添加了一个冗长、详细的讨论。【参考方案2】：

您的第一个查询按升序排序可以利用您在ItemComment 上的索引。

SELECT * ... ORDER BY ... LIMIT 1 是一个臭名昭著的性能反模式。为什么？服务器必须对一大堆乱七八糟的行进行排序，以丢弃除第一行之外的所有行。

您可以试试这个（对于您的降序变体）。它有点冗长，但效率更高。

   SELECT a.* 
     FROM ItemInfo a
     JOIN (
            SELECT MAX(ItemComments) ItemComments, ItemType
              FROM ItemInfo
             GROUP BY ItemType
          ) maxcomm ON a.ItemType = maxcomm.ItemType
                   AND a.ItemComments = maxcomm.ItemComments
    WHERE a.ItemType = 'item type'

为什么会这样？它使用 GROUP BY / MAX() 而不是 ORDER BY ... DESC LIMIT 1 来查找最大值。子查询执行您的搜索。

为了使这项工作尽可能高效，您需要在(ItemType, ItemComments) 上建立一个复合（多列）索引。用

创建

ALTER TABLE ItemInfo CREATE INDEX ItemTypeCommentIndex (ItemType, ItemComments);

当您创建新索引时，将您的索引放在ItemType，因为新索引与那个索引是多余的。

MySQL 的查询计划器足够智能，可以在运行内部 GROUP BY 查询之前查看外部 WHERE 子句，因此它不必聚合整个表。

有了这个复合索引，MySQL 可以使用松散索引扫描来满足子查询。这些几乎是奇迹般的快。您应该阅读该主题。

【讨论】：

【参考方案3】：

您的查询将根据 where 条件选择所有行。之后它将根据语句的顺序对行进行排序，然后它将选择第一行。更好的查询是这样的

SELECT ifo.* FROM ItemInfo ifo 
JOIN (SELECT MAX(ItemComments) AS max_comnt FROM ItemInfo WHERE ItemType="item_type") inn_ifo 
ON ifo.ItemComments = inn_ifo.max_comnt

由于此查询仅从列中找到最大值。找到 MAX() 只是 O(n) 但最快的排序算法是 O(nlogn) 。因此，如果您避免按 statemet 排序，则查询将执行得更快。希望这会有所帮助。

【讨论】：

以上是关于MySQL 选择查询在 where 和降序下都变得很慢的主要内容，如果未能解决你的问题，请参考以下文章