MySQL:子查询检查超过 14000 行的子查询优化问题
Posted
技术标签:
【中文标题】MySQL:子查询检查超过 14000 行的子查询优化问题【英文标题】:MySQL: Subquery optimization problem where subquery checks over 14000 rows 【发布时间】:2020-12-22 23:41:07 【问题描述】:我需要帮助来优化下面的子查询。简而言之,我有以下查询,其中tree
表连接s_id
上的branch
表和最大timestamp
的branch
表基于子查询条件。
我对这个查询返回的结果很满意。但是,此查询非常慢。瓶颈是检查超过 14000 行的依赖子查询 (branch2
)。如何优化子查询以加快查询速度?
SELECT *
FROM dept.tree tree
LEFT JOIN dept.branch branch ON tree.s_id = branch.s_id
AND branch.timestamp =
(
SELECT MAX(timestamp)
FROM dept.branch branch2
WHERE branch2.s_id = tree.s_id
AND branch2.timestamp <= tree.timestamp
)
WHERE tree.timestamp BETWEEN CONVERT_TZ('2020-05-16 00:00:00', 'America/Toronto', 'UTC')
AND CONVERT_TZ('2020-05-16 23:59:59', 'America/Toronto', 'UTC')
AND tree.s_id IN ('459','460')
ORDER BY tree.timestamp ASC;
表树:
id box_id timestamp
373001645 1 2020-05-07 06:00:20
373001695 1 2020-05-07 06:02:26
373001762 1 2020-05-07 06:05:17
373001794 1 2020-05-07 06:06:38
373001810 2 2020-05-07 06:07:21
表分支:
id box_id timestamp data
373001345 1 2020-05-07 06:00:20 "R": 0.114, "H": 20.808
373001395 1 2020-05-07 06:02:26 "R": 0.12, "H": 15.544
373001462 1 2020-05-07 06:03:01 "R": 0.006, "H": 55.469
373001494 1 2020-05-07 06:04:38 "R": 0.004, "H": 51.85
373001496 1 2020-05-07 06:05:18 "R": 0.02, "H": 5.8965
373001497 1 2020-05-07 06:06:39 "R": 0.12, "H": 54.32
373001510 2 2020-05-07 06:07:09 "R": 0.34, "H": 1.32
373001511 2 2020-05-07 06:07:29 "R": 0.56, "H": 32.7
分支有 s_id 和时间戳索引
我使用的是 5.7.25-google-log 版本
EXPLAIN 给出以下内容:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 PRIMARY tree range unique_timestamp_s_id,idx_s_id_timestamp,idx_timestamp idx_s_id_timestamp 10 2629 100.00 Using index condition; Using filesort
1 PRIMARY branch ref unique_timestamp_s_id,idx_timestamp unique_timestamp_s_id 5 func 1 100.00 Using where
2 DEPENDENT SUBQUERY branch2 ref unique_timestamp_s_id,idx_s_id_timestamp,idx_timestamp idx_s_id_timestamp 5 tree.s_id 14122 33.33 Using where; Using index
【问题讨论】:
样本数据、所需结果以及对您要实现的逻辑的说明会有所帮助。 或者等几分钟 什么是精确的 mysql 版本? 这是一个“groupwise-max”? 我在上面添加了更多信息。我很抱歉没有早点添加它们。让我知道我是否应该提供更多详细信息。谢谢 【参考方案1】:这样会更快:
select
tree.s_id, tree.timestamp, branch.data
from
(
SELECT
tree.s_id, tree.timestamp, max(branch.timestamp) as max_branch_timestamp
FROM
dept.tree tree
LEFT JOIN dept.branch branch
ON(
branch.s_id = tree.s_id
and branch.timestamp <= tree.timestamp
)
WHERE
tree.timestamp BETWEEN
CONVERT_TZ('2020-05-16 00:00:00', 'America/Toronto', 'UTC') AND
CONVERT_TZ('2020-05-16 23:59:59', 'America/Toronto', 'UTC')
AND tree.s_id IN ('459','460')
group by tree.s_id, tree.timestamp
) tree
left outer join branch
on(
branch.s_id = tree.s_id
and branch.timestamp = tree.max_branch_timestamp
)
【讨论】:
【参考方案2】:请提供SHOW CREATE TABLE
。
branch
需要INDEX(s_id, timestamp)
你需要LEFT
吗?它可能会无缘无故地减慢查询速度。
IN
在一列和BETWEEN
在另一列的组合可能优化不佳;你运行的是什么版本?
请提供EXPLAIN SELECT
,以便我们讨论是否优化。如果不是,我们可以讨论如何将IN
(OR
的变体)变成UNION
。
这实际上可能比我上面考虑的方法要快...
有了上面的索引,然后大大重写查询:
SELECT b.*
FROM ( SELECT s_id,
MAX(timestamp) as timestamp
FROM dept.branch
WHERE timestamp BETWEEN
CONVERT_TZ('2020-05-16 00:00:00', 'America/Toronto', 'UTC')
AND CONVERT_TZ('2020-05-16 23:59:59', 'America/Toronto', 'UTC')
AND s_id IN ('459','460')
) AS x
JOIN dept.branch AS b USING(s_id, timestamp)
首先,看看这是否得到了正确的信息。然后我将解释如何在子查询中执行UNION
(如果您需要帮助)。
【讨论】:
我在上面提供了更多信息,如果有帮助,请告诉我。我很抱歉没有早点提供。我对您的查询有疑问,您是否加入了 dept.tree 而不是 dept.branch? 上述查询给出以下错误:错误代码:1140。在没有 GROUP BY 的聚合查询中,SELECT 列表的表达式#1 包含非聚合列 'dept.s_id';这与 sql_mode=only_full_group_by 不兼容 糟糕。我需要再看一遍以上是关于MySQL:子查询检查超过 14000 行的子查询优化问题的主要内容,如果未能解决你的问题,请参考以下文章
SQL 错误 ORA 01427 - 子查询返回超过 1 行的更新语句