优化来自多个表的连接查询

Posted

技术标签:

【中文标题】优化来自多个表的连接查询【英文标题】:Optimize join query from multiple tables 【发布时间】:2021-08-11 11:34:09 【问题描述】:

我有通过外键相互连接的表(postgresql 13.1)。

order: order_id, name
sub_order: mainorder, order_id (foreign key to order), detail
task_group: id, group_name
tasks: id, taskname, task_group_id (foregin key to group_name)
task_kind: id, kind_name
task_task_kind: id, kind_id(fk to task_kind), task_id (fk to task)
time_per_project: person, start_time, stop_time, part, order_id (foreign key to sub_order), 

希望我描述的足够多。我对物化视图的查询如下,效果很好:

SELECT
  so.order_id AS order_id,
  MIN(so.status) AS status_id,
  SUM(AGE(tpp.stop_time, tpp.start_time)) AS total,
  SUM(
    CASE WHEN (tasksgroups.id = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS srut,
  SUM(
    CASE WHEN (tpp.valve_part_id = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS korpus,
  SUM(
    CASE WHEN (tasks_with_kinds.task_kind = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS zwykle,
  SUM(
    CASE WHEN (tasks_with_kinds.task_kind = 6) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS wyprawki
FROM
  intranet.sub_orders so
  LEFT JOIN intranet.time_per_project tpp ON so.mainorder = tpp.project_id
  LEFT JOIN intranet.task_task_kind tasks_with_kinds ON tasks_with_kinds.id = tpp.task
  LEFT JOIN intranet.task tasks ON tasks.id = tasks_with_kinds.task_id
  LEFT JOIN intranet.task_group tasksgroups ON tasksgroups.id = tasks.task_group
GROUP BY
  so.order_id
HAVING (SUM(AGE(tpp.stop_time, tpp.start_time)) > interval '0 minutes');

我想添加另一个与表的连接,如下所示:

article_group: id, group_name
article_cost: id, group_id (fk to article_group), order_id (fk to sub_orders)

我最终加入了子查询,因为对于某些项目,它计算了同一行两次或更多次

SELECT
  so.order_id AS order_id,
  MIN(so.status) AS status_id,
  SUM(AGE(tpp.stop_time, tpp.start_time)) AS total,
  SUM(
    CASE WHEN (tasksgroups.id = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS srut,
  SUM(
    CASE WHEN (tpp.valve_part_id = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS korpus,
  SUM(
    CASE WHEN (tasks_with_kinds.task_kind = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS zwykle,
  SUM(
    CASE WHEN (tasks_with_kinds.task_kind = 6) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS wyprawki,
  ac.transport,
  ac.service
FROM
  intranet.sub_orders so
  LEFT JOIN intranet.time_per_project tpp ON so.mainorder = tpp.project_id
  LEFT JOIN intranet.task_task_kind tasks_with_kinds ON tasks_with_kinds.id = tpp.task
  LEFT JOIN intranet.task tasks ON tasks.id = tasks_with_kinds.task_id
  LEFT JOIN intranet.task_group tasksgroups ON tasksgroups.id = tasks.task_group
  LEFT JOIN (
    SELECT
      soa.order_id AS ordid,
      sum(
        CASE WHEN group_id = 14 THEN
          COST
        END) AS transport,
      sum(
        CASE WHEN group_id = 11 THEN
          COST
        END) AS service
    FROM
      intranet.article_costs
      INNER JOIN intranet.sub_orders soa ON soa.mainorder = project_id
    GROUP BY
      soa.order_id) ac ON ac.ordid = so.order_id
WHERE order_id = 2074
GROUP BY
  so.order_id, ac.transport, ac.service
HAVING (SUM(AGE(tpp.stop_time, tpp.start_time)) > interval '0 minutes' OR ac.transport > 0 or ac.service > 0);

不知道您是否认为这个物化视图查询可以? 如果为真,是否可以在没有嵌套连接的子查询的情况下实现相同的行为?

【问题讨论】:

【参考方案1】:

关于没有子查询的相同行为

WITH ac as(
SELECT
      soa.order_id AS ordid,
      sum(
        CASE WHEN group_id = 14 THEN
          COST
        END) AS transport,
      sum(
        CASE WHEN group_id = 11 THEN
          COST
        END) AS service
    FROM
      intranet.article_costs
      INNER JOIN intranet.sub_orders soa ON soa.mainorder = project_id
    GROUP BY
      soa.order_id
)

SELECT
  so.order_id AS order_id,
  MIN(so.status) AS status_id,
  SUM(AGE(tpp.stop_time, tpp.start_time)) AS total,
  SUM(
    CASE WHEN (tasksgroups.id = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS srut,
  SUM(
    CASE WHEN (tpp.valve_part_id = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS korpus,
  SUM(
    CASE WHEN (tasks_with_kinds.task_kind = 1) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS zwykle,
  SUM(
    CASE WHEN (tasks_with_kinds.task_kind = 6) THEN
      AGE(tpp.stop_time, tpp.start_time)
    END) AS wyprawki,
  ac.transport,
  ac.service
FROM
  intranet.sub_orders so
  LEFT JOIN intranet.time_per_project tpp ON so.mainorder = tpp.project_id
  LEFT JOIN intranet.task_task_kind tasks_with_kinds ON tasks_with_kinds.id = tpp.task
  LEFT JOIN intranet.task tasks ON tasks.id = tasks_with_kinds.task_id
  LEFT JOIN intranet.task_group tasksgroups ON tasksgroups.id = tasks.task_group
  LEFT JOIN ac ON ac.ordid = so.order_id
WHERE order_id = 2074
GROUP BY
  so.order_id, ac.transport, ac.service
HAVING (SUM(AGE(tpp.stop_time, tpp.start_time)) > interval '0 minutes' OR ac.transport > 0 or ac.service > 0);

不知道你觉得这个物化视图查询没问题?

如果数据或查询时间太大 - 使用具体化(但在此之前 - 优化查询)。

【讨论】:

以上是关于优化来自多个表的连接查询的主要内容,如果未能解决你的问题,请参考以下文章

mysql中,如何向测试人员介绍连接查询和子查询的优劣势?

计算来自多个表的连接数

使用表函数优化多个连接

MYSQL连接查询

MySQL内的连接查询(新手必知)

MySQL内的连接查询(新手必知)