如何使用嵌套循环加速查询

Posted

技术标签:

【中文标题】如何使用嵌套循环加速查询【英文标题】:How to speed up a query with nested loops 【发布时间】:2014-01-16 17:38:03 【问题描述】:

我有这个查询有效,但速度很慢

SELECT
  ID_NODE,
  -- this case slows down the query!!!
  CASE WHEN (EXISTS (SELECT MV.ID_CHILD FROM MYVIEW MV INNER JOIN MYTABLE1 MT1 ON MT1.ID_NODE = MV.ID_CHILD WHERE MV.ID_PARENT = CA.ID_NODE AND ID_FATHER IS NOT NULL)) THEN 'Y' ELSE 'N' END AS HAVE_CHILDREN,
  OTHER_FIELDS
FROM
  MYTABLE2 

更新:在第一个答案之后,我意识到我的示例并不完美,所以我对其进行了 2 处更改(CAMT1,并写了 MT1.ID_FATHER 而不是 ID_FATHER )

  SELECT
  ID_NODE,
  -- this case slows down the query!!!
  CASE WHEN (EXISTS (SELECT MV.ID_CHILD FROM MYVIEW MV INNER JOIN MYTABLE1 MT1 ON MT1.ID_NODE = MV.ID_CHILD WHERE MV.ID_PARENT = MT2.ID_NODE AND MT1.ID_FATHER IS NOT NULL)) THEN 'Y' ELSE 'N' END AS HAVE_CHILDREN,
  OTHER_FIELDS
FROM
  MYTABLE2

更新结束

基本上我想要一个'y'/'n' 关于“这个节点有孩子吗?”的结果

在执行计划中,我只看到一个警告:

嵌套循环(内连接))43%

您能否建议对查询进行改进?

作为 exterme 解决方案,我可以将 HAVE_CHILDREN 值作为新字段存储在表中,但我不喜欢这样,因为它是“通往 bug 的高速公路”。

赏金注意事项:

我在这里发布原始表、视图(使用 CREATE 语句制作)和查询以帮助提供回复:

--This is MYTABLE1

CREATE TABLE [dbo].[MAN_PRG_OPERAZIONI](
    [ID_PROG_OPERAZIONE] [int] NOT NULL,
    [ID_CESPITE] [int] NOT NULL,
    [ID_TIPO_OPERAZIONE] [int] NOT NULL,
    [SEQUENZA] [int] NOT NULL,
    [ID_RESPONSABILE] [int] NULL,
    [DATA_SCADENZA] [datetime] NULL,
    [DATA_ULTIMA] [datetime] NULL,
    [ID_TIPO_FREQUENZA] [int] NOT NULL,
    [FREQUENZA] [int] NOT NULL,
    [NOTIFICA_SCADENZA] [nchar](1) NOT NULL,
    [COSTO_FISSO] [numeric](19, 4) NOT NULL,
    [NOTE] [nvarchar](max) NULL,
    [ID_CONTO_FORNITORE] [int] NULL,
    [ID_ESECUTORE] [int] NULL,
    [GIORNI_INTERVENTO_PREVISTI] [int] NOT NULL,
    [RIPETIZIONE] [nchar](1) NOT NULL,
    [RIPETIZIONE_CONTINUA] [nchar](1) NOT NULL,
    [RIPETI_FINO_A] [datetime] NULL,
    [SOSPESO] [nchar](1) NOT NULL,
    [ORE_LAVORO_PREVISTE] [decimal](8, 2) NOT NULL,
    [DESCR_TITOLO_OPERAZIONE] [nvarchar](100) NOT NULL,
    [ID_TEMPLATE] [int] NULL,
    [TEMPLATE] [nvarchar](25) NULL,
    [ID_PARENT_TEMPLATE_REMOTE] [int] NULL,
    [ATTIVO] [nchar](1) NOT NULL,
    [ID_PARENT_TEMPLATE] [int] NULL,
    [NOTIFY_RESPONSIBLE] [nchar](1) NULL,
    [NOTIFY_EXECUTOR] [nchar](1) NULL,
    [NOTIFY_OTHERS] [nvarchar](200) NULL,
    [NOTIFY_INADVANCE] [nchar](1) NULL,
    [NOTIFY_ADVANCE_DAYS] [int] NULL,
    [NOTIFY_ONEXECUTION] [nchar](1) NULL,
    [NOTIFY_ONCLOSE] [nchar](1) NULL,
    [ID_UTENTE_INS] [int] NULL,
    [DATA_INS] [datetime] NULL,
    [ID_UTENTE_ULT_MOD] [int] NULL,
    [DATA_ULTIMA_MOD] [datetime] NULL,
    [STATO_CKL] [int] NOT NULL,
    [TAGAPPSYNC] [nchar](1) NOT NULL,
    [IS_FATHER] [nchar](1) NOT NULL,
    [ID_FATHER] [int] NULL,
    [NOTIFY_DELAYS] [nchar](1) NULL,
    [NOTIFY_DELAYS_DAYS] [int] NULL,
    [NOTIFY_INS_USER] [nchar](1) NULL,
 CONSTRAINT [PK_MAN_PRG_OPERAZIONI] PRIMARY KEY CLUSTERED 
(
    [ID_PROG_OPERAZIONE] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

--This is MYTABLE2

CREATE TABLE [dbo].[CES_ANAGRAFICA](
    [ID_CESPITE] [int] NOT NULL,
    [ID_CESPITE_PADRE] [int] NULL,
    [COD_CESPITE] [nvarchar](50) NOT NULL,
    [DESCR_CESPITE] [nvarchar](120) NOT NULL,
    [IMMATERIALE] [nchar](1) NOT NULL,
    [DATA_ACQUISTO] [datetime] NULL,
    [DATA_ENTRATA_FUNZIONE] [datetime] NULL,
    [DATA_DISMISSIONE] [datetime] NULL,
    [BENE_USATO] [nchar](1) NOT NULL,
    [ID_UBICAZIONE] [int] NULL,
    [NRO_IDENTIFICAZIONE] [nvarchar](50) NULL,
    [MARCA] [nvarchar](50) NULL,
    [MODELLO] [nvarchar](50) NULL,
    [MARCATURA_CE] [nchar](1) NULL,
    [ANNO_COSTRUZIONE] [int] NULL,
    [MATRICOLA_COSTRUTTORE] [nvarchar](50) NULL,
    [COSTRUTTORE] [nvarchar](80) NULL,
    [ID_CONTO_FORNITORE] [int] NULL,
    [NOTE] [nvarchar](max) NULL,
    [ID_TIPO_CESPITE] [int] NOT NULL,
    [ID_STATO_CESPITE] [int] NULL,
    [ID_CONTO_PROPRIETA] [int] NULL,
    [ID_RESPONSABILE] [int] NULL,
    [DATA_SCAD_GARANZIA] [datetime] NULL,
    [CAMPO_MISURA] [nvarchar](80) NULL,
    [CRITERI_ACC] [nvarchar](80) NULL,
    [RISOLUZIONE] [nvarchar](80) NULL,
    [ID_USO_STRUMENTO] [int] NULL,
    [ID_REFERENTE] [int] NULL,
    [FOTO] [varbinary](max) NULL,
    [PROF_ID] [int] NOT NULL,
    [ID_TEMPLATE] [int] NULL,
    [TEMPLATE] [nvarchar](25) NULL,
    [ID_PARENT_TEMPLATE_REMOTE] [int] NULL,
    [ID_PARENT_TEMPLATE] [int] NULL,
    [ISLOCKED] [nchar](1) NULL,
    [TAGAPPSYNC] [nchar](1) NOT NULL,
    [TAGAPPDOCSYNC] [nchar](1) NOT NULL,
 CONSTRAINT [PK_CES_ANAGRAFICA] PRIMARY KEY CLUSTERED 
(
    [ID_CESPITE] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON, FILLFACTOR = 90) ON [PRIMARY]
) ON [PRIMARY]

--This is MYVIEW

CREATE VIEW [dbo].[V_CESPITE_TREE] AS
--BEGIN
    WITH    q AS
            (
            SELECT  ID_CESPITE , ID_CESPITE AS ID_CESPITE_ANCESTOR
            FROM    CES_ANAGRAFICA c
            JOIN CES_TIPI_CESPITE ctc ON ctc.ID_TIPI_INFRSTR = c.ID_TIPO_CESPITE
            UNION ALL
            SELECT  c.ID_CESPITE, q.ID_CESPITE_ANCESTOR 
            FROM    q
            JOIN    CES_ANAGRAFICA c 
            ON      c.ID_CESPITE_PADRE = q.ID_CESPITE
            JOIN CES_TIPI_CESPITE ctc ON ctc.ID_TIPI_INFRSTR = c.ID_TIPO_CESPITE
            ) 
    select ID_CESPITE AS ID_CHILD, ID_CESPITE_ANCESTOR AS ID_PARENT from q

GO

-- So my original query was this:

SELECT
  CA.ID_CESPITE,CASE WHEN (EXISTS (SELECT VCA.ID_CHILD FROM V_CESPITE_TREE VCA INNER JOIN MAN_PRG_OPERAZIONI MPO ON MPO.ID_CESPITE = VCA.ID_CHILD WHERE VCA.ID_PARENT = CA.ID_CESPITE AND ID_FATHER IS NOT NULL)) THEN 'Y' ELSE 'N' END AS HAVE_CHILD_PRG,
  CA.ID_CESPITE_PADRE
  <Other Fields>

FROM
  CES_ANAGRAFICA CA LEFT OUTER JOIN
  CES_PERMESSI CP ON ((CA.ID_CESPITE = CP.ID_CESPITE)) INNER JOIN CES_TIPI_CESPITE CTCS ON CA.ID_TIPO_CESPITE = CTCS.ID_TIPI_INFRSTR LEFT OUTER JOIN
  V_UTENTI_DIPENDENTI VUD ON CA.ID_RESPONSABILE = VUD.ID_DIPENDENTE

【问题讨论】:

带前缀CA的引用表是什么? 嵌套查询与mytable2 有何关系? 您能否更具体地说明您希望在解决方案中看到什么? 你能说出每张表有多少行吗? 我认为您缺少一些引用的表。指示性行数确实也很有帮助。我假设 MT2MYTABLE2 的别名?此外,这必须是单查询解决方案还是我们可以使用中间临时表和/或表变量? 【参考方案1】:
SELECT
  CA.ID_CESPITE,
  CASE WHEN child.[Count] > 0 THEN 'Y' ELSE 'N' END AS HAVE_CHILD_PRG,
  CA.ID_CESPITE_PADRE
FROM
  CES_ANAGRAFICA CA
LEFT OUTER JOIN CES_PERMESSI CP
  ON ((CA.ID_CESPITE = CP.ID_CESPITE))
INNER JOIN CES_TIPI_CESPITE CTCS
  ON CA.ID_TIPO_CESPITE = CTCS.ID_TIPI_INFRSTR
LEFT OUTER JOIN V_UTENTI_DIPENDENTI VUD
  ON CA.ID_RESPONSABILE = VUD.ID_DIPENDENTE
CROSS APPLY (
  SELECT [Count] = COUNT(*)
  FROM V_CESPITE_TREE VCA
  JOIN MAN_PRG_OPERAZIONI MPO
    ON MPO.ID_CESPITE = VCA.ID_CHILD
  WHERE VCA.ID_PARENT = CA.ID_CESPITE
    AND MPO.ID_FATHER is not null
) child

这应该会稍微好一些,因为它不必执行聚合函数。

SELECT
  CA.ID_CESPITE,
  CASE WHEN child.[Exists] = 1 THEN 'Y' ELSE 'N' END AS HAVE_CHILD_PRG,
  CA.ID_CESPITE_PADRE
FROM
  CES_ANAGRAFICA CA
LEFT OUTER JOIN CES_PERMESSI CP
  ON ((CA.ID_CESPITE = CP.ID_CESPITE))
INNER JOIN CES_TIPI_CESPITE CTCS
  ON CA.ID_TIPO_CESPITE = CTCS.ID_TIPI_INFRSTR
LEFT OUTER JOIN V_UTENTI_DIPENDENTI VUD
  ON CA.ID_RESPONSABILE = VUD.ID_DIPENDENTE
OUTER APPLY (
  SELECT TOP (1) 1 [Exists]
  FROM V_CESPITE_TREE VCA
  JOIN MAN_PRG_OPERAZIONI MPO
    ON MPO.ID_CESPITE = VCA.ID_CHILD
  WHERE VCA.ID_PARENT = CA.ID_CESPITE
    AND MPO.ID_FATHER is not null
) child

【讨论】:

也许它会起作用,CROSS JOINCROSS APPLY 让我害怕进一步阅读......随着数据量的增长,计算量将成倍增长。 这适用于 4500 条记录(考虑到这对我来说是现实生活的上限)。即使你应得的,我也不会奖励你,因为 David Khaykin 帮助了我很多。谢谢。 @user193655 你能告诉我第二个查询是否更好吗?【参考方案2】:

使用 CTE ;它会加载一次你的内部连接,然后缓存它。

注意我不知道 CA.ID_NODE 来自哪里,因为你没有解释。 您的内部查询也连接到 MyTable1,但您没有关联 MyTable2 和子查询。

鉴于您提供的内容,伪代码应该是这样的: (如果您使用相关信息更新您的问题,我将更新此答案以反映它)。

更新:

这是基于您的架构更新和一些示例数据的更新版本。我确认这不再导致重复。真正的问题是您检查它并确保它提高性能,也就是降低执行时间。

; with hasChildCte(ID_CESPITE, ID_PARENT)
As (
        SELECT VCA.ID_CHILD,
            vca.ID_PARENT
        FROM V_CESPITE_TREE VCA 
          INNER JOIN MAN_PRG_OPERAZIONI MPO 
            ON MPO.ID_CESPITE = VCA.ID_CHILD 
        WHERE ID_FATHER 
            IS NOT NULL
)

Select
    CA.ID_CESPITE,
    Case 
        When Exists (
            Select ID_PARENT 
            From hasChildCte cte 
        Where cte.ID_PARENT = ca.ID_CESPITE
        ) Then 'Y'
        Else 'N'
    End As HAVE_CHILDREN,
    CA.ID_CESPITE_PADRE
From CES_ANAGRAFICA CA

另请注意,如果您没有在所有连接列上都有索引,那么将它们放入其中是一个不错的举措。这将有助于进一步加快查询速度,尤其是在您处理大量数据。

更新 2

关于 CTE 执行不止一次的评论让我思考,显然由 SQL Server 决定是否缓存 CTE,而不是总是缓存。在许多情况下,CTE 只会执行一次,但其他时候它类似于 SQL Server 中的视图并且不会被缓存。

因此,我修改了代码以改用table variable。不过,我没有足够的测试数据来查看哪个性能更好或更快。

试试这个,看看它是否会产生更快的查询执行时间。还要注意,无论您选择哪种重构和性能改进方法,最好使用您在 JOIN 中使用的列上的索引正确设置您的数据库。这会显着增加查询执行时间,但需要更新索引的插入成本。

更新的非 CTE 代码,使用表变量代替:

Declare @HasChildren table (ID_CESPITE int, ID_PARENT int)

Insert into @HasChildren
Select VCA.ID_CHILD,
    vca.ID_PARENT
From V_CESPITE_TREE VCA 
    Inner Join MAN_PRG_OPERAZIONI MPO 
    On MPO.ID_CESPITE = VCA.ID_CHILD 
Where ID_FATHER 
    Is Not Null

Select
    CA.ID_CESPITE,
    Case 
        When Exists (
            Select ID_PARENT 
            From @HasChildren c
        Where c.ID_PARENT = ca.ID_CESPITE
        ) Then 'Y'
        Else 'N'
    End As HAVE_CHILDREN,
    CA.ID_CESPITE_PADRE
From CES_ANAGRAFICA CA

【讨论】:

感谢您的回复,正如您所指出的,由于我的示例不完美,您的建议不起作用。我对其进行了修改,您会在原始版本的下方找到修改后的版本。 @user193655 我更新了答案以反映您的新信息,试试吧。如果仍然难以使其工作,则需要发布 MyTable1、MyTable2 和 MyView 的架构 再次感谢,这一次我至少可以尝试一下,无论如何我仍然看到不同之处(您的查询中的记录比我原来的多 50 条)。我将发布原始表结构(所以从现在开始,我指的是原始名称而不是“模拟的名称”)。我还将开始赏金以奖励您应得的声誉。不幸的是,我是 CTE 的新手,所以我不能说太多. 太棒了!查询结果现在真的是一样的。 +500 给你! 太棒了,只是好奇为您节省了多少执行时间?【参考方案3】:

对于优化器来说,下面这样的事情可能更容易处理。

SELECT DISTINCT
  ID_NODE,
  CASE WHEN MTSub.ID_CHILD is null then 'N' else 'Y' END,
 OTHERFIELDS
FROM MYTABLE2
LEFT JOIN  
(SELECT MV.ID_CHILD FROM MYVIEW MV INNER JOIN MYTABLE1 MT1 ON MT1.ID_NODE = MV.ID_CHILD WHERE MV.ID_PARENT = MT2.ID_NODE AND MT1.ID_FATHER IS NOT NULL) MTSub

【讨论】:

您缺少 LEFT JOIN 的连接谓词

以上是关于如何使用嵌套循环加速查询的主要内容,如果未能解决你的问题,请参考以下文章

在更新字典时加速嵌套的 Python 循环

如何使用CUDA并行化嵌套for循环以在2D数组上执行计算

如何从 postgresql 10.3 中的这个多重连接查询中删除嵌套循环

嵌套查询 - 内部查询循环结束外部查询循环

将嵌套循环查询组合到父数组结果 - pg-promise

PHP 循环遍历嵌套的 JSON 响应并重新组装为 Webhook 的简单查询字符串