使用条件连接优化 SQL 查询

Posted 2023-04-15

技术标签:

【中文标题】使用条件连接优化 SQL 查询【英文标题】：optimizing SQL query with conditional join 【发布时间】：2017-02-17 20:55:25 【问题描述】：

我需要运行查询来比较两个表并返回不在第二个表中的记录。棘手的部分是两个表之间的链接是有条件的。我有多个来源输入 Table2，然后输入 Table1。我试过使用连接，但这并不比我下面的快。这些表中有大约 50k 条记录。完成此查询大约需要 1.5 分钟。我希望能把它弄到几秒钟左右。 Table1 和 Table2 已经在这些字段上有索引。数据库在兼容级别 SQL2008 上运行。

此查询大约需要 1.5 分钟：

select *
from Table1 t1 
where not exists (select * 
                  from Table2 t2
                  where t2.seedID = case t2.SeedSource when 'SeedSource1' then t1.SeedSource1
                                                       when 'SeedSource2' then t1.SeedSource2
                                                       when 'SeedSource3' then t1.SeedSource3
                                                       when 'SeedSource4' then t1.SeedSource4 
                                                       when 'SeedSource5' then t1.SeedSource5 end)

此查询需要五分钟以上：

select d.*
from Tabel1 t1 left join
     Table2 t2 on t2.seedID = case t2.SeedSource when 'SeedSource1' then t1.SeedSource1 
                                                 when 'SeedSource2' then t1.SeedSource2
                                                 when 'SeedSource3' then t1.SeedSource3
                                                 when 'SeedSource4' then t1.SeedSource4
                                                 when 'SeedSource5' then t1.SeedSource5  end
where t2.seedID is NULL

任何帮助将不胜感激。

【问题讨论】：

那么，哪个表是m，哪个表是d，他们怎么知道t1和t2？能否包含与问题相关的表结构子集？您的别名太多，无法确定您要完成的任务 d, t1, t2, m 我用正确的别名更新了第二个查询。 【参考方案1】：

实际上它们不是同一个查询，因为左连接会在多个匹配项上返回多行

不存在是更好的方法

希望SeedSource、SeedSource1-5、seedID被索引

select *
from Table1 t1 
where not exists ( select * 
                   from Table2 t2
                   where t2.seedID = t1.SeedSource1 
                   and   t2.SeedSource = 'SeedSource1' 
                   union all  
                   select * 
                   from Table2 t2
                   where t2.seedID = t1.SeedSource2 
                   and   t2.SeedSource = 'SeedSource2'
                   //...
                 )

也许

left join Table2 t2
  on ( t2.seedID = t1.SeedSource1 and t2.SeedSource = 'SeedSource1' )
  or ( t2.seedID = t1.SeedSource2 and t2.SeedSource = 'SeedSource2' )
  // ...

【讨论】：

我没想过这样做，但我会试一试。我认为 OR 不会比 CASE 语句更好，但我们拭目以待。 TY 您的解决方案正在终止使用 SeedSource1-5 上的索引，所以我打赌两者都更快。您获得了 UNION 子查询的胜利。这将查询时间缩短到大约 20 秒。 OR 比我的第一个示例查询花费的时间更长。谢谢！我敢打赌 OR 比你的第二个更快【参考方案2】：

阅读您的要求后，您的查询看起来不错。（最好使用Not Exists 代替Not IN 和Left Join）

你可以做一个小优化：

在WHERE 子句中使用not exists (select 1 ... 而不是not exists (select * ... 当您可以选择常量时，不必选择所有列。（更好的性能）

参考

https://explainextended.com/2009/09/15/not-in-vs-not-exists-vs-left-join-is-null-sql-server/

【讨论】：

使用exists() 时select 1 和select * 之间没有runtime 性能差异。尽管使用select 1 将避免在查询编译期间检查该表的任何不需要的元数据。 EXISTS Subqueries: SELECT 1 vs. SELECT - Conor Cunningham【参考方案3】：

也许：

SELECT t1.* FROM table1 t1 LEFT JOIN table2 t2 on t1.seedID = t2.seedID 
WHERE t2.seedID is NULL AND t1.SeedSource IN ('SeedSource1','SeedSource2','SeedSource3','SeedSource4','SeedSource5')

【讨论】：

这不起作用，因为 t2.SeedSource 将始终是这些值之一。主要的是 t2.SeedSource 的值决定了 t1 中的哪些字段将用于连接两个表。此外，没有 t1.seedID。

以上是关于使用条件连接优化 SQL 查询的主要内容，如果未能解决你的问题，请参考以下文章