CROSS JOIN + LEFT JOIN 子查询的替代策略？

Posted 2023-02-24

技术标签:

【中文标题】CROSS JOIN + LEFT JOIN 子查询的替代策略？【英文标题】：Alternative strategy to CROSS JOIN + LEFT JOIN subquery? 【发布时间】：2015-11-03 10:56:27 【问题描述】：

我想加入一个具有时间单位的表格（注意：这些不是连续的）

Time 1
Time 2

…带有部门…

的表格

Department 1
Department 2

...为了匹配observations表，但只选择X类型的那些...

Time unit     Department id       Observation    Type
Time 1        Department 1        6               X
Time 2        Department 2        5               X
Time 2        Department 2        4               Y

…最终得到一个这样的表——缺失的观察用 0 或 NULL 填充

Time unit     Department id     Observation
Time 1        Department 1        6
Time 2        Department 1        0
Time 1        Department 2        0
Time 2        Department 2        5

这可以完成工作，但速度很慢，所以我确信肯定有比以下更好的方法？

SELECT timeunits.time_unit, departments.department_id, observations.observation 
FROM timeunits
CROSS JOIN departments
LEFT JOIN   (
    SELECT observations.time_unit, observations.department_id, observations.observation 
    FROM observations
    WHERE observations.type='X'
    ) as observations
ON timeunits.time_unit=observations.time_unit 
AND departments.department_id=observations.department_id

解释：

+----+-------------+--------------+-------+---------------+-------------+---------+---------------------------------------------+--------+----------------------------------------------------+
| id | select_type | table        | type  | possible_keys | key         | key_len | ref                                         |  rows  | Extra                                              |
+----+-------------+--------------+-------+---------------+-------------+---------+---------------------------------------------+--------+----------------------------------------------------+
|  1 | PRIMARY     | time_units   | ALL   | NULL          | NULL        | NULL    | NULL                                        |    200 | NULL                                               |
|  1 | PRIMARY     | departments  | index | NULL          | PRIMARY     | 4       | NULL                                        |    500 | Using index; Using join buffer (Block Nested Loop) |
|  1 | PRIMARY     | <derived2>   | ref   | <auto_key0>   | <auto_key0> | 263     | observations.time_units.time_unit,          |        |                                                    |
|    |             |              |       |               |             |         | observations.departments.department_id      |    600 | Using where                                        |
|  2 | DERIVED     | observations | ref   | type          | type        | 258     | const                                       | 100000 | Using index condition                              |
+----+-------------+--------------+-------+---------------+-------------+---------+---------------------------------------------+--------+----------------------------------------------------+

【问题讨论】：

【参考方案1】：

我们已经看到type = 'X' 与observations 表不太常见。

像这样直接消除子查询：

SELECT timeunits.time_unit, departments.department_id, observations.observation
FROM timeunits
  JOIN departments
  LEFT JOIN observations ON observations.time_unit = timeunits.time_unit 
    AND observations.department_id = departments.department_id
    AND observations.type = 'X'

导致更高的执行时间，因为 mysql 目前只能在 observations 中的一列上使用索引。只要这不是type，就会加入完整的observations-table（因为我们明确查询所有department_id 和time_unit 组合），然后具有不同type 的列将被删除=>全表扫描。

对无子查询语句的可能优化将是组合索引。理想情况下在 department_id、time_unit 和 type 上，因为这三个都在连接条件中与 equals 一起使用。为了减少存储开销，我们可以（并且可能应该）省略排除最少数据的列。

如果我们计划选择范围，例如time_unit 稍后我们应该将此列放在索引的最后，以便能够最好地使用索引。

【讨论】：

我也认为子查询会拖累性能，但恐怕您的建议运行速度要慢一个数量级。我的查询目前需要约 2 秒才能完成，但我不得不在约 60 秒后终止您建议的修改。提前致谢！添加了上面的 EXPLAIN 输出。 :-) 直觉会说type = 'X' 在observations-table 中的频率低于department_id 和time_unit 的组合。 observations 中存在哪些索引？同意。它应该暗示什么？因此，我认为子查询快速将搜索减少到我正在寻找的“类型”是合乎逻辑的步骤……但必须有一种更优雅的方式……索引位于type、department_id 和time_unit。跨度> time_unit 和department_id 上是否有组合索引？这可以提高我的提案的速度。否则我们可以尝试强制连接使用type 上的索引，但我不会期待太多。

以上是关于CROSS JOIN + LEFT JOIN 子查询的替代策略？的主要内容，如果未能解决你的问题，请参考以下文章