如何优化或加速以下 sql 查询?

Posted

技术标签:

【中文标题】如何优化或加速以下 sql 查询?【英文标题】:How can i optimize or speed up the following sql query? 【发布时间】:2021-01-21 17:42:45 【问题描述】:

所以我有以下 SQL 查询,顺便说一句,它在数据提取方面工作得很好。唯一的问题是检索数据需要很长时间。

select distinct cast(bb.idPrefix as varchar)+'-'+cast(bb.id as varchar) as 'TicketID',
bb.Title,
bb.Description,
bb.Submitter,
bb.IssueType,
bb.ProgressStatus as 'Status',
bb.Resolution,
bb.CurrentOwner as 'Assignee',
bb.TimeEstimated / 60 as 'TimeEstimated (minutes)',
bb.TimeRemaining /60 as 'TimeRemaining (minutes)',
bb.TimeLogged /60 as 'TimeSpent (minutes)',
bb.Projectname ,
case when bb.id in (Select id from project.sprint GROUP BY id HAVING COUNT(*) > 1) and 
                      (Select count(*) from project.sprint where id = bb.Id GROUP BY id HAVING COUNT(*) > 1) >1 then 
                      (select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc)
     when bb.id in (Select id from project.sprint GROUP BY id HAVING COUNT(*) > 1) and 
                      (Select count(*) from project.sprint where id = bb.id GROUP BY id HAVING COUNT(*) > 1) >1 then 
                      (select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc)
                       else null end as 'End Sprint',
case when bb.id in (Select id from project.sprint GROUP BY id HAVING COUNT(*) > 1) and 
                      (Select count(*) from project.sprint where id = bb.id GROUP BY id HAVING COUNT(*) > 1) > 1 then 
                      (select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid Asc)
     when bb.id in (Select id from project.sprint GROUP BY id HAVING COUNT(*) > 1) and 
                      (Select count(*) from project.sprint where id = bb.id GROUP BY id HAVING COUNT(*) > 1) > 1 then 
                      (select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid Asc) end as 'Start Sprint',
case when bb.resolution is null and bb.TimeEstimated is null and bb.CurrentOwner is null then 1 else 0 end as 'Backlog',
case when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid asc),14,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid asc),14,1)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid asc),15,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid asc),14,2)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid asc),16,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid asc),14,3)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid asc),12,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid asc),12,1)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid asc),13,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid asc),12,2)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid asc),14,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid asc),12,3) else 0 end as [StartCO],
case when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc),14,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc),14,1)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc),15,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc),14,2)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc),16,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'XXXXX sprint%' order by sprintid desc),14,3)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc),12,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc),12,1)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc),13,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc),12,2)
     when substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc),14,1) like '[0-9]' then substring((select top 1 SprintName from project.sprint where id = bb.id and bb.projectid = 12345 and sprintname like 'YYY sprint%' order by sprintid desc),12,3) else 0 end as [EndCO]
from project.bugs bb 
left join project.sprint bs on
bb.id = bs.id 
left join project.logs l on 
bb.id = l.id
where bb.projectid = 18540
and bb.IssueType = 'tracking' and bb.idPrefix like 'Test%' and (bb.title like '%XXXXX%auto%' or bb.title like '%YYY%auto%')

任何想法我可以改变什么来加快这个过程?

【问题讨论】:

好吧,摆脱所有这些子查询将是一个开始。另外,你真的需要DISTINCT吗?它们可能非常昂贵。 我还注意到,您将 join 留给了 project.sprintproject.logs,但您在查询的其他任何地方都没有引用它们。他们为什么在那里?那些连接可以去; DISTINCT 也很可能因为它们被删除了。 请您以文本格式分享您的输入和输出数据以及表格结构吗? 哦,是的,我猜那些 JOINS 仍然存在我正在探索存储在这些表中的数据以及它们之间的关系。至于 DISTINCT 部分,没有它,我认为它会提取一些我不想要的重复项 “至于 DISTINCT 部分,没有它,我认为它会提取一些我不想要的重复项” 因为你有 LEFT JOINs 到你的表' t 使用,我假设这些连接是多对一的...... 【参考方案1】:

这里到处都是我添加的 cmets。有人指出,所需的行为是未知/不可能的,因此需要解决。但是,我相信我重写了正确的逻辑。我摆脱了几乎所有的子查询栏 2;我将其余部分移至FROM 中的OUTER APPLY。当然,这是不可能测试的,因为我没有样本数据,也没有预期的结果,但是,SQL 确实解析:

SELECT CAST(bb.idPrefix AS varchar) + '-' + CAST(bb.id AS varchar) AS TicketID, --Don't use single quotes for aliases, they are for literal strings
       bb.Title,
       bb.Description,
       bb.Submitter,
       bb.IssueType,
       bb.ProgressStatus AS Status,--Don't use single quotes for aliases, they are for literal strings
       bb.Resolution,
       bb.CurrentOwner AS Assignee,--Don't use single quotes for aliases, they are for literal strings
       bb.TimeEstimated / 60 AS TimeEstimatedMinutes,--Don't use single quotes for aliases, they are for literal strings. Also, stick to names that don't need delimit identifing
       bb.TimeRemaining / 60 AS TimeRemainingMinutes,--Don't use single quotes for aliases, they are for literal strings.  Also, stick to names that don't need delimit identifing
       bb.TimeLogged / 60 AS TimeSpentMinutes,--Don't use single quotes for aliases, they are for literal strings.  Also, stick to names that don't need delimit identifing
       bb.Projectname,
       CASE
            WHEN (SELECT COUNT(*)
                  FROM project.sprint s
                  WHERE s.id = bb.Id
                  --HAVING is pointless when you're checking in the CASE. If you want to use a HAVING use EXISTS in the CASE
                  GROUP BY id) > 1 THEN SNx.SprintName
            --This will never be true, it's the same as the last WHEN
            /*
            WHEN (SELECT COUNT(*)
                  FROM project.sprint s
                  WHERE s.id = bb.id
                  GROUP BY id
                  --HAVING is pointless when you're checking in the CASE. If you want to use a HAVING use EXISTS in the CASE
                  HAVING COUNT(*) > 1) > 1 THEN SNy.SprintName 
            */
       --ELSE NULL is redundant, a CASE expression already returns NULL if it doesn't evaluate to TRUE
       END AS EndSprint, --Don't use single quotes for aliases, they are for literal strings. Also, stick to names that don't need delimit identifing
       CASE
            WHEN (SELECT COUNT(*)
                  FROM project.sprint s
                  WHERE s.id = bb.id
                  --HAVING is pointless when you're checking in the CASE. If you want to use a HAVING use EXISTS in the CASE
                  GROUP BY id) > 1 THEN SNx.SprintName
            --This will never be true, it's the same as the last WHEN
            /*
            WHEN (SELECT COUNT(*)
                  FROM project.sprint s
                  WHERE s.id = bb.id
                  GROUP BY id
                  --HAVING is pointless when you're checking in the CASE. If you want to use a HAVING use EXISTS in the CASE
                  HAVING COUNT(*) > 1) > 1 THEN SNy.SprintName 
            */
       --ELSE NULL is redundant, a CASE expression already returns NULL if it doesn't evaluate to TRUE
       END AS StartSprint, --Don't use single quotes for aliases, they are for literal strings. Also, stick to names that don't need delimit identifing
       CASE
            WHEN bb.resolution IS NULL
             AND bb.TimeEstimated IS NULL
             AND bb.CurrentOwner IS NULL THEN 1
            ELSE 0
       END AS Backlog, --Don't use single quotes for aliases, they are for literal strings
       CASE
            WHEN SUBSTRING(SNx.SprintName, 14, 1) LIKE '[0-9]' THEN SUBSTRING(SNx.SprintName, 14, 1)
            WHEN SUBSTRING(SNx.SprintName, 15, 1) LIKE '[0-9]' THEN SUBSTRING(SNx.SprintName, 14, 2)
            WHEN SUBSTRING(SNx.SprintName, 16, 1) LIKE '[0-9]' THEN SUBSTRING(SNx.SprintName, 14, 3)
            WHEN SUBSTRING(SNy.SprintName, 12, 1) LIKE '[0-9]' THEN SUBSTRING(SNy.SprintName, 12, 1)
            WHEN SUBSTRING(SNy.SprintName, 13, 1) LIKE '[0-9]' THEN SUBSTRING(SNy.SprintName, 12, 2)
            WHEN SUBSTRING(SNy.SprintName, 14, 1) LIKE '[0-9]' THEN SUBSTRING(SNy.SprintName, 12, 3)
            ELSE 0
       END AS StartCO,
       CASE
            WHEN SUBSTRING(SNx.SprintName, 14, 1) LIKE '[0-9]' THEN SUBSTRING(SNx.SprintName, 14, 1)
            WHEN SUBSTRING(SNx.SprintName, 15, 1) LIKE '[0-9]' THEN SUBSTRING(SNx.SprintName, 14, 2)
            WHEN SUBSTRING(SNx.SprintName, 16, 1) LIKE '[0-9]' THEN SUBSTRING(SNx.SprintName, 14, 3)
            WHEN SUBSTRING(SNy.SprintName, 12, 1) LIKE '[0-9]' THEN SUBSTRING(SNy.SprintName, 12, 1)
            WHEN SUBSTRING(SNy.SprintName, 13, 1) LIKE '[0-9]' THEN SUBSTRING(SNy.SprintName, 12, 2)
            WHEN SUBSTRING(SNy.SprintName, 14, 1) LIKE '[0-9]' THEN SUBSTRING(SNy.SprintName, 12, 3)
            ELSE 0
       END AS EndCO
FROM project.bugs bb
     OUTER APPLY (SELECT TOP 1
                         s.SprintName
                  FROM project.sprint s
                  WHERE s.id = bb.id
                    AND bb.projectid = 12345
                    AND s.sprintname LIKE 'XXXXX sprint%'
                  ORDER BY s.sprintid ASC) SNx
     OUTER APPLY (SELECT TOP 1
                         s.SprintName
                  FROM project.sprint s
                  WHERE s.id = bb.id
                    AND bb.projectid = 12345
                    AND s.sprintname LIKE 'YYY sprint%'
                  ORDER BY s.sprintid ASC) SNy
WHERE bb.projectid = 18540
  AND bb.IssueType = 'tracking'
  AND bb.idPrefix LIKE 'Test%'
  AND (bb.title LIKE '%XXXXX%auto%' --This will never be SARGable due to leading wildcard.
    OR bb.title LIKE '%YYY%auto%'); --This will never be SARGable due to leading wildcard.

当然,索引是另一回事,但如果没有对象的完整 DDL 和 DML 以及执行计划(如果结果正确,则此查询的执行计划),我什至不会在这里开始讨论。

【讨论】:

你是最棒的!我最初的查询有很多缺陷......【参考方案2】:

使用 WITH 语句来分解常见查询。这是 SQL Server 中的 CTE 或公用表表达式。

您不必在 WITH 语句中执行整个子查询,只需简化数据即可。

例如:

WITH MY_SPRINT_QUERY AS

select Id, SprintName, sprintId from project.sprint 
where 
(sprintname like "XXXXX sprint%")
OR 
(sprintname like "YYY sprint%")

SELECT <insert your simplified query here>

【讨论】:

以上是关于如何优化或加速以下 sql 查询?的主要内容,如果未能解决你的问题,请参考以下文章

如何使用多个 JOIN 加速 SQL 查询?

hive 或 impala 中的计算表统计信息如何加速 Spark SQL 中的查询?

加速oracle SQL查询

sql优化原则与技巧

SQL中的聚合——如何加速查询

有啥方法可以在同一个大表上使用 3x UNION All 来加速复杂查询?