MySQL索引查询对特定列值花费很长时间

Posted 2023-03-23

技术标签:

【中文标题】MySQL索引查询对特定列值花费很长时间【英文标题】：MySQL index query taking long time for specific column value 【发布时间】：2016-11-28 12:24:10 【问题描述】：

我有 2 个 mysql (Ver 14.14 Distrib 5.5.49) 表，看起来像这样：

CREATE TABLE `Document` (
    `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `CompanyCode` int(10) unsigned NOT NULL,
    `B` int(10) unsigned NOT NULL,
    `C` int(10) unsigned NOT NULL,
    `DocumentCode` int(10) unsigned NOT NULL,
    `E` int(11) DEFAULT '0',
    `EpochSeconds` int(11) DEFAULT '0',
    `G` int(10) unsigned NOT NULL,
    `H` int(10) unsigned NOT NULL,
    `I` int(11) DEFAULT '0',
    `J` int(11) DEFAULT '0',
    `K` varchar(48) DEFAULT '',
  PRIMARY KEY (`Id`),
    KEY `Idx1` (`CompanyCode`),
    KEY `Idx2` (`B`,`C`),
    KEY `Idx3` (`CompanyCode`,`DocumentCode`),
    KEY `Idx4` (`CompanyCode`,`B`,`C`),
    KEY `Idx5` (`H`),
    KEY `Idx6` (`CompanyCode`,`K`),
    KEY `Idx7` (`K`),
    KEY `Idx8` (`K`,`E`),
    KEY `NEWIDX` (`DocumentCode`,`EpochSeconds`),
) ENGINE=MyISAM AUTO_INCREMENT=397783215 DEFAULT CHARSET=latin1

CREATE TABLE `Company` (
    `Id` int(10) unsigned NOT NULL AUTO_INCREMENT,
    `CompanyCode` int(10) unsigned NOT NULL,
    `CompanyName` varchar(150) NOT NULL,
    `C` varchar(2) NOT NULL,
    `D` varchar(10) NOT NULL,
    `E` varchar(150) NOT NULL,
  PRIMARY KEY (`Id`),
    KEY `Idx1` (`CompanyCode`),
    KEY `Idx2` (`CompanyName`),
    KEY `Idx3` (`C`),
    KEY `Idx4` (`D`,`C`)
    KEY `Idx5` (`E`)
) ENGINE=MyISAM AUTO_INCREMENT=9218804 DEFAULT CHARSET=latin1

我已经省略了 Company 中的大部分列定义，因为我不想使问题不必要地复杂化，但那些缺失的列不涉及任何 KEY 定义。

Document 有约 1250 万行，Company 有约 600,000 行。我已将 KEY NEWIDX 添加到 Document 以方便以下查询：

SELECT Document.*, Company.CompanyName FROM Document, Company where Document.DocumentCode = ?和 Document.CompanyCode = Company.CompanyCode ORDER BY Document.EpochSeconds desc LIMIT 0, 30;

执行计划：

+----+-------------+--------------+------+-----------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
| id | select_type | table        | type | possible_keys                     | key         | key_len | ref                          | rows   | Extra                           |
+----+-------------+-------+------+------------------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
|  1 | SIMPLE      | Company      | ALL  | Idx1                              | NULL        | NULL    | NULL                         | 593729 | Using temporary; Using filesort |
|  1 | SIMPLE      | Document     | ref  | Idx1,Idx4,Idx6,NEWIDX,Idx3        | Idx3        | 8       | db.Company.CompanyCode,const |      3 |                                 |
+----+-------------+-------+------+-----------------------------------------------------------+-------------+---------+----------------------+--------+------------------------+

如果上面Document.DocumentCode 的值不是8，则查询会立即返回（0.00 秒）。如果值为8，则查询需要38 到45 秒之间的任何时间。如果我从查询中删除Company，例如

SELECT * FROM Document where DocumentCode = 8 ORDER BY EpochSeconds desc LIMIT 0, 30;

执行计划：

+----+-------------+-----------+------+---------------+------------+---------+-------+---------+-------------+
| id | select_type | table     | type | possible_keys | key        | key_len | ref   | rows    | Extra       |
+----+-------------+-----------+------+---------------+------------+---------+-------+---------+-------------+
|  1 | SIMPLE      | Documents | ref  | NEWIDX        | NEWIDX     | 4       | const | 3654177 | Using where |
+----+-------------+-----------+------+---------------+------------+---------+-------+---------+-------------+

...然后查询立即返回（0.00 秒）。

Document.DocumentCode 的可能值范围是 369，这些值的分布范围足够大。 Document 中有大约 315 万行 DocumentCode = 8。另外，考虑到 Document 中有大约 150 万行 DocumentCode = 9，并且该查询会立即返回。

我还在Document 表上运行了mysqlcheck 实用程序，它没有报告任何问题。

为什么在查询中使用 Company 连接时 DocumentCode = 8 的查询会花费这么长时间，而 DocumentCode 的任何其他值返回得这么快？

下面是 DocumentCode = 8 的执行计划比较：

+----+-------------+--------------+------+-----------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
| id | select_type | table        | type | possible_keys                     | key         | key_len | ref                          | rows   | Extra                           |
+----+-------------+-------+------+------------------------------------------+-------------+---------+------------------------------+--------+---------------------------------+
|  1 | SIMPLE      | Company      | ALL  | Idx1                              | NULL        | NULL    | NULL                         | 593729 | Using temporary; Using filesort |
|  1 | SIMPLE      | Document     | ref  | Idx1,Idx4,Idx6,NEWIDX,Idx3        | Idx3        | 8       | db.Company.CompanyCode,const |      3 |                                 |
+----+-------------+-------+------+-----------------------------------------------------------+-------------+---------+----------------------+--------+------------------------+

和 DocumentCode = 9：

+----+-------------+----------+------+----------------------------+--------+---------+--------------------------+---------+-------------+
| id | select_type | table    | type | possible_keys              | key    | key_len | ref                      | rows    | Extra       |
+----+-------------+----------+------+----------------------------+--------+---------+--------------------------+---------+-------------+
|  1 | SIMPLE      | Document | ref  | Idx1,Idx4,Idx6,NEWIDX,Idx3 | NEWIDX | 4       | const                    | 1953090 | Using where |
|  1 | SIMPLE      | Company  | ref  | Idx1                       | Idx1   | 4       | db.Document.CompanyCode  |       1 |             |
+----+-------------+----------+------+----------------------------+--------+---------+--------------------------+---------+-------------+

它们显然不同，但我对它们的理解不足以解释正在发生的事情。另外，执行ANALYZE TABLE Document 和ANALYZE TABLE Company 都报告OK。

【问题讨论】：

能否请您添加查询的执行计划？不应该在过滤之前加入条件，即...where Document.CompanyCode = Company.CompanyCode and Document.DocumentCode = ?我认为这会稍微优化查询如果您使用 8 和另一个使用 9 来解释查询，它们是否相同？我怀疑 MySQL 可能已经反转了 2 个查询之间的连接顺序（可能来自过时的统计信息 - 尝试执行 ANALYZE TABLE Document 和 ANALYZE TABLE Company） @Pramod 颠倒子句顺序无效。我将比较 8 和 9 之间的解释，现在回发结果... 我添加了 DocumentCode 8 和 9 的执行计划比较。请注意，我最初在项目符号中发布的总行数不正确。它们现在已更新。 【参考方案1】：

使用 STRAIGHT_JOIN 强制 MySQL 执行连接的顺序

SELECT Document.*, 
Company.CompanyName 
FROM Document
STRAIGHT_JOIN Company 
ON Document.CompanyCode = Company.CompanyCode
WHERE Document.DocumentCode = ? 
ORDER BY Document.EpochSeconds DESC
LIMIT 0, 30;

【讨论】：

这会导致 8 查询立即返回。我理解缺点，因为这限制了 MySQL 选择它认为是执行查询的最有效方式的能力（在这种情况下不正确），但考虑到我的表会（缓慢）随着时间的推移而增长，我应该担心关于什么？在这种情况下可能不是。更令人担忧的是指定要使用的特定索引，以防将来有人删除或重命名该索引。【参考方案2】：

这种行为的原因在于 mysql 优化您的查询的方式 - 或者至少尝试这样做。您可以在解释的查询中看到这一点。 Mysql 更改它用作查询基础的表。 documentCode = 8 它基于公司，documentCode=9 它基于文档。 Mysql 认为，对于 documentCode=8，如果它不使用索引而是使用另一个表作为基础，它会更快。为什么我不知道。

我建议你使用一个 explizit 连接，告诉 mysql 哪些表以 wich 顺序使用：

SELECT Document.*, Company.CompanyName 
FROM Document 
JOIN Company ON Document.CompanyCode = Company.CompanyCode 
WHERE Document.DocumentCode = ?
ORDER BY Document.EpochSeconds desc LIMIT 0, 30;

Mysql 甚至支持告诉它，它应该使用什么索引：

SELECT Document.*, Company.CompanyName 
FROM Document 
JOIN Company USE INDEX Idx1 ON Document.CompanyCode = Company.CompanyCode 
WHERE Document.DocumentCode = ?
ORDER BY Document.EpochSeconds desc LIMIT 0, 30;

您也可以尝试 FORCE INDEX，而不是 USE INDEX。这样更强。但我猜它会默认使用 Idx1。

但请注意，您的新索引 NEWIDX 不会用于此查询，因为它需要先加入并过滤没有索引的结果集。所以这个对结果的 ORDER BY 是一个非常昂贵的操作。

【讨论】：

请注意，您放置显式连接的顺序不会强制 MySQL 执行连接的顺序

以上是关于MySQL索引查询对特定列值花费很长时间的主要内容，如果未能解决你的问题，请参考以下文章