MySQL大表性能优化

Posted 2023-04-15

技术标签:

【中文标题】MySQL大表性能优化【英文标题】：MySQL large table performance optimization 【发布时间】：2020-12-09 15:49:39 【问题描述】：

我正在尝试解决此表的性能问题

+--------------+------------------+------+-----+---------+----------------+
| Field        | Type             | Null | Key | Default | Extra          |
+--------------+------------------+------+-----+---------+----------------+
| id           | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| direction_id | int(10) unsigned | NO   | MUL | NULL    |                |
| created_at   | datetime         | NO   |     | NULL    |                |
| rate         | decimal(16,6)    | NO   |     | NULL    |                |
+--------------+------------------+------+-----+---------+----------------+

其中包含大约 1 亿行

只有一个查询从该表中选择数据：

SELECT AVG(rate) AS rate, created_at 
FROM statistics 
WHERE direction_id = ? 
AND created_at BETWEEN ? AND ? 
GROUP BY created_at

direction_id 是外键，但选择性很差：

+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
| id | select_type | table      | partitions | type | possible_keys                   | key                             | key_len | ref   | rows  | filtered | Extra                                                               |
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+
|  1 | SIMPLE      | statistics | NULL       | ref  | statistics_direction_id_foreign | statistics_direction_id_foreign | 4       | const | 26254 |    11.11 | Using index condition; Using where; Using temporary; Using filesort |
+----+-------------+------------+------------+------+---------------------------------+---------------------------------+---------+-------+-------+----------+---------------------------------------------------------------------+

所以我正在寻找一种方法来解决这个问题并需要建议。 HASH(direction_id) 分区对我有帮助吗？如果有帮助，最好的方法是什么？

或者也许有其他方法可以解决它。

【问题讨论】：

YEAR(created_at), MONTH(created_at), DAY(created_at) 嗯？ @Strawberry 没关系，我觉得可以省略 【参考方案1】：

对于平均每日费率，您是说这个吗？

SELECT AVG(rate) AS rate, 
       DATE(created_at) 
    FROM statistics 
    WHERE direction_id = ? 
      AND created_at BETWEEN ? AND ? 
    GROUP BY DATE(created_at)

还有INDEX(direction_id, created, rate)——它既是“覆盖”又是“复合”。解释会说“使用索引”来表示“覆盖”，这表明整个查询可以只看索引的 BTree 来执行。因此，“覆盖”会带来额外的性能提升。

更改为涉及DATE(created_at) 的精美索引可能对此查询没有帮助。

PARTITIONing没有指明。

可能会显示“汇总表”。 http://mysql.rjweb.org/doc.php/summarytables

【讨论】：

我决定减少每个direction_id 的数据量。我认为这是唯一的方法。谢谢。【参考方案2】：

首先，让我们修复您的查询，使其成为有效的聚合查询。大概，你想要rate的日均值，所以：

SELECT AVG(rate) AS rate, DATE(created_at) as created_day
FROM statistics 
WHERE direction_id = ? AND created_at BETWEEN ? AND ? 
GROUP BY DATE(created_at)

然后，我建议创建以下索引：

create index idx_statistics on statistics (direction_id, created_at, rate);

在最新版本的 MySQL 中，我们还可以考虑在 date(create_at) 上使用索引。如果您可以接受以下where 子句：

WHERE direction_id = ? AND DATE(created_at) BETWEEN ? AND ?

那么下面的索引就派上用场了：

create index idx_statistics on statistics (direction_id, (date(created_at)), rate);

【讨论】：

索引是否需要包含rate？具体来说，MySQL 8.0.13 是第一个支持表达式索引的版本。好像没有效果。索引已使用但仍有 26200 行 @ArtemIlchenko：我不确定你的评论。索引是为了提高性能，而不是为了改变查询返回的行数。索引被称为“覆盖”，因为它包含了查询中任何地方所需的所有列。解释将通过保存“使用索引”（与“使用索引条件”不同）来表明这一点。

以上是关于MySQL大表性能优化的主要内容，如果未能解决你的问题，请参考以下文章