如何最有效地连接两个表?
Posted
技术标签:
【中文标题】如何最有效地连接两个表?【英文标题】:How to most efficiently join two tables? 【发布时间】:2012-01-16 16:03:32 【问题描述】:我有两个表,用于存储特定 ReportingPeriod 的 LineItemTypes 的金额和调整。我正在寻找最有效的方法来查询两个表中存在的每个 ReportingPeriod/LineItemType 组合的 Amount 和 Adjustment。
架构如下:
@ReportingPeriodComposition(1030 行 - 表变量)
Src int,
GroupReportingPeriodId int,
ReportingPeriodId int,
ClientId int,
PeriodDate date,
...
PRIMARY KEY CLUSTERED (Src, ReportingPeriodId)
数量(~30,000,000 行)
ReportingPeriodId int,
LineItemTypeId smallint,
Amount decimal,
PRIMARY KEY CLUSTERED (ReportingPeriodId, LineItemTypeId)
调整(~180,000 行)
ReportingPeriodId int,
LineItemTypeId smallint,
Amount decimal,
Comment nvarchar(2500),
...
AdjustmentId int,
PRIMARY KEY NONCLUSTERED (AdjustmentId),
UNIQUE KEY CLUSTERED (ReportingPeriodId, LineItemTypeId)
我想通过唯一的 ReportingPeriodId/LineItemTypeId 选择金额和调整,从而产生以下结果集:
| ReportingPeriodId | LineItemTypeId | Amount | Adjustment |
目前我正在使用以下查询,但我很想知道是否有人对如何更有效地完成此操作有想法。欢迎所有建议!
SELECT
rpc.ReportingPeriodId,
COALESCE(a.LineItemTypeId, adj.LineItemTypeId) LineItemTypeId,
a.Amount,
adj.Amount Adjustment
FROM @ReportingPeriodComposition rpc
LEFT JOIN Watchlist.risk.Amount a
ON rpc.ReportingPeriodId = a.ReportingPeriodId
LEFT JOIN Watchlist.risk.Adjustment adj
ON rpc.ReportingPeriodId = adj.ReportingPeriodId
AND (a.ReportingPeriodId IS NULL OR a.LineItemTypeId = adj.LineItemTypeId)
WHERE
Src = @Src
AND (a.LineItemTypeId IS NOT NULL OR adj.LineItemTypeId IS NOT NULL)
请注意,@Src 变量是确定我们需要从 @ReportingPeriodComposition 表变量中提取哪些源值所必需的。查询结果约为 138,000 行:
1 行同时包含 Amount 和 Adjustment,尽管此数字可能会因 ReportingPeriodComposition 而异 0 行只有一个调整项,但不能保证这种情况执行计划 XML
<?xml version="1.0" encoding="utf-16"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.1" Build="10.0.4064.0" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
<BatchSequence>
<Batch>
<Statements>
<StmtSimple StatementCompId="9" StatementEstRows="104.769" StatementId="5" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" StatementSubTreeCost="0.343989" StatementText="SELECT
 rpc.ReportingPeriodId,
 COALESCE(a.LineItemTypeId, adj.LineItemTypeId) LineItemTypeId,
 a.Amount,
 adj.Amount Adjustment
FROM @ReportingPeriodComposition rpc
LEFT JOIN Rating.risk.Amount a
 ON rpc.ReportingPeriodId = a.ReportingPeriodId
LEFT JOIN Rating.risk.Adjustment adj
 ON rpc.ReportingPeriodId = adj.ReportingPeriodId
 AND (a.ReportingPeriodId IS NULL OR a.LineItemTypeId = adj.LineItemTypeId)
WHERE
 Src = @Src
 AND (a.LineItemTypeId IS NOT NULL OR adj.LineItemTypeId IS NOT NULL)" StatementType="SELECT" QueryHash="0x425781A4C1D20919" QueryPlanHash="0xF3E9DD0ADAD04044">
<StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
<QueryPlan DegreeOfParallelism="1" CachedPlanSize="24" CompileTime="5" CompileCPU="5" CompileMemory="424">
<RelOp AvgRowSize="31" EstimateCPU="1.04769E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.343989">
<OutputList>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
<ColumnReference Column="Expr1006" />
</OutputList>
<ComputeScalar>
<DefinedValues>
<DefinedValue>
<ColumnReference Column="Expr1006" />
<ScalarOperator ScalarString="CASE WHEN [Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId] IS NOT NULL THEN [Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId] ELSE [Rating].[risk].[Adjustment].[LineItemTypeId] as [adj].[LineItemTypeId] END">
<IF>
<Condition>
<ScalarOperator>
<Compare CompareOp="IS NOT">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="NULL" />
</ScalarOperator>
</Compare>
</ScalarOperator>
</Condition>
<Then>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
</Then>
<Else>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
</Else>
</IF>
</ScalarOperator>
</DefinedValue>
</DefinedValues>
<RelOp AvgRowSize="33" EstimateCPU="9.21971E-05" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Filter" NodeId="1" Parallel="false" PhysicalOp="Filter" EstimatedTotalSubtreeCost="0.343979">
<OutputList>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="137631" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<Filter StartupExpression="false">
<RelOp AvgRowSize="33" EstimateCPU="0.000437936" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Left Outer Join" NodeId="2" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.343886">
<OutputList>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="137647" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<NestedLoops Optimized="false" WithUnorderedPrefetch="true">
<OuterReferences>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
<ColumnReference Column="Expr1009" />
</OuterReferences>
<RelOp AvgRowSize="26" EstimateCPU="0.000437936" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Left Outer Join" NodeId="4" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00711828">
<OutputList>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="137647" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<NestedLoops Optimized="false">
<OuterReferences>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
</OuterReferences>
<RelOp AvgRowSize="11" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="5" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="0">
<OutputList>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="1030" ActualEndOfScans="1" ActualExecutions="1" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
</DefinedValue>
</DefinedValues>
<Object Table="[@ReportingPeriodComposition]" Index="[PK__#6FDF7DF__F9ABEE3F71C7C670]" Alias="[rpc]" />
<SeekPredicates>
<SeekPredicateNew>
<SeekKeys>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="Src" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="[@Src]">
<Identifier>
<ColumnReference Column="@Src" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekKeys>
</SeekPredicateNew>
</SeekPredicates>
</IndexScan>
</RelOp>
<RelOp AvgRowSize="22" EstimateCPU="0.000272246" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimateRows="104.769" LogicalOp="Clustered Index Seek" NodeId="6" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.00339725" TableCardinality="29974300">
<OutputList>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="137631" ActualEndOfScans="1030" ActualExecutions="1030" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="Amount" />
</DefinedValue>
</DefinedValues>
<Object Database="[Rating]" Schema="[risk]" Table="[Amount]" Index="[PK_Amount]" Alias="[a]" IndexKind="Clustered" />
<SeekPredicates>
<SeekPredicateNew>
<SeekKeys>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="@ReportingPeriodComposition.[ReportingPeriodId] as [rpc].[ReportingPeriodId]">
<Identifier>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekKeys>
</SeekPredicateNew>
</SeekPredicates>
</IndexScan>
</RelOp>
</NestedLoops>
</RelOp>
<RelOp AvgRowSize="18" EstimateCPU="0.000165111" EstimateIO="0.003125" EstimateRebinds="103.769" EstimateRewinds="0" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="7" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.33565" TableCardinality="178911">
<OutputList>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
</OutputList>
<RunTimeInformation>
<RunTimeCountersPerThread Thread="0" ActualRows="1" ActualEndOfScans="137647" ActualExecutions="137647" />
</RunTimeInformation>
<IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" NoExpandHint="false">
<DefinedValues>
<DefinedValue>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
</DefinedValue>
<DefinedValue>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="Amount" />
</DefinedValue>
</DefinedValues>
<Object Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Index="[IX_Adjustment_ReportingPeriodId_LineItemTypeId]" Alias="[adj]" IndexKind="Clustered" />
<SeekPredicates>
<SeekPredicateNew>
<SeekKeys>
<Prefix ScanType="EQ">
<RangeColumns>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="ReportingPeriodId" />
</RangeColumns>
<RangeExpressions>
<ScalarOperator ScalarString="@ReportingPeriodComposition.[ReportingPeriodId] as [rpc].[ReportingPeriodId]">
<Identifier>
<ColumnReference Table="@ReportingPeriodComposition" Alias="[rpc]" Column="ReportingPeriodId" />
</Identifier>
</ScalarOperator>
</RangeExpressions>
</Prefix>
</SeekKeys>
</SeekPredicateNew>
</SeekPredicates>
<Predicate>
<ScalarOperator ScalarString="[Rating].[risk].[Amount].[ReportingPeriodId] as [a].[ReportingPeriodId] IS NULL OR [Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId]=[Rating].[risk].[Adjustment].[LineItemTypeId] as [adj].[LineItemTypeId]">
<Logical Operation="OR">
<ScalarOperator>
<Compare CompareOp="IS">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="ReportingPeriodId" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="NULL" />
</ScalarOperator>
</Compare>
</ScalarOperator>
<ScalarOperator>
<Compare CompareOp="EQ">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
</Compare>
</ScalarOperator>
</Logical>
</ScalarOperator>
</Predicate>
</IndexScan>
</RelOp>
</NestedLoops>
</RelOp>
<Predicate>
<ScalarOperator ScalarString="[Rating].[risk].[Amount].[LineItemTypeId] as [a].[LineItemTypeId] IS NOT NULL OR [Rating].[risk].[Adjustment].[LineItemTypeId] as [adj].[LineItemTypeId] IS NOT NULL">
<Logical Operation="OR">
<ScalarOperator>
<Compare CompareOp="IS NOT">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Amount]" Alias="[a]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="NULL" />
</ScalarOperator>
</Compare>
</ScalarOperator>
<ScalarOperator>
<Compare CompareOp="IS NOT">
<ScalarOperator>
<Identifier>
<ColumnReference Database="[Rating]" Schema="[risk]" Table="[Adjustment]" Alias="[adj]" Column="LineItemTypeId" />
</Identifier>
</ScalarOperator>
<ScalarOperator>
<Const ConstValue="NULL" />
</ScalarOperator>
</Compare>
</ScalarOperator>
</Logical>
</ScalarOperator>
</Predicate>
</Filter>
</RelOp>
</ComputeScalar>
</RelOp>
<ParameterList>
<ColumnReference Column="@Src" ParameterRuntimeValue="(2)" />
</ParameterList>
</QueryPlan>
</StmtSimple>
</Statements>
</Batch>
</BatchSequence>
</ShowPlanXML>
【问题讨论】:
任何机会都可以显示它生成的查询计划 - 从显示的描述和架构来看,没有任何事情会出现“明显”错误 我能做到。你想让我怎么看? XML 或图形,都可以工作 @Andrew 我已经添加了执行计划。我删除了序列中的其他批次,因此希望它仍处于有意义的状态。我本来会包含完整的计划,但不幸的是,这会使我超出字符数限制。 【参考方案1】:我可以看到您发布的查询计划中没有什么特别糟糕的地方 - 我怀疑 SQL 做出了正确的选择。我能发现的唯一有点狡猾的是查询计划估计和返回的实际行数相距甚远——这表明统计信息并不完全是最新的——你可以强制更新统计信息并查看它是否继续使用相同的查询计划。
如果您遇到性能不一致的问题,请在开发框中清除查询计划缓存并为 @SRC
值生成查询计划,该值将产生很少的行,然后清除计划缓存并生成查询计划一个@SRC
值,它会产生大量要返回的行。如果查询计划相同,则可以,如果它们不同,则可能需要使用OPTIMIZE FOR
提示。这有时会发生在参数化查询上,其中第一次运行决定了缓存中的计划 - 在该计划过期之前,查询的后续运行使用相同的计划。
您现在必须提供更多信息,说明您遇到/希望通过审核来解决的具体问题?
【讨论】:
【参考方案2】:使用JOIN HINT 怎么样?
来自 MSDN:
循环 |哈希 | MERGE 指定查询中的连接应该使用 循环、散列或合并。使用循环 |HASH | MERGE JOIN 强制执行 两个表之间的特定连接。 LOOP 不能一起指定 使用 RIGHT 或 FULL 作为连接类型。
REMOTE 指定连接操作在 正确的表。当左表是本地表时,这很有用 右表是远程表。 REMOTE 应仅在以下情况下使用 左表的行数少于右表。
如果右表是本地的,则连接在本地执行。如果两者 表是远程的,但来自不同的数据源,REMOTE 会导致 加入要在右表的站点上执行。如果两个表 是来自同一数据源的远程表,不需要 REMOTE。
当在 使用 COLLATE 将连接谓词强制转换为不同的排序规则 子句。
REMOTE 只能用于 INNER JOIN 操作。
在您的情况下,您可以使用 LOOP 连接,因为您正在处理 LEFT 连接。除了您的查询看起来不错之外,您在过滤的列上是否有索引?
您的金额表确实有很多行 - 但我见过更多的数据库。您使用的硬件是什么?
有关如何使用 LOOP JOIN 并展示特定优化的示例,请参见 article。但这完全取决于使用连接提示时的查询类型。它可能不适用,在您的情况下应该是最后的选择。
【讨论】:
查询优化器选择为左外连接执行嵌套循环。 98% 的情况是 IX_Adjustment_ReportingPeriodId_LineItemTypeId 上的聚集索引搜索。 1 实际行数,137647 执行次数,104.769 估计执行次数【参考方案3】:我会尝试用#temp
表替换@table_variable
,以便SQL Server 可以使用更准确的统计信息。
目前它假定表变量将返回 1 行并选择嵌套循环计划。如果考虑到实际的表基数,您可能会得到不同的结果。
【讨论】:
以上是关于如何最有效地连接两个表?的主要内容,如果未能解决你的问题,请参考以下文章