有效地为集合中的每个类别选择顶行
Posted
技术标签:
【中文标题】有效地为集合中的每个类别选择顶行【英文标题】:Efficiently select top row for each category in the set 【发布时间】:2010-06-04 14:06:45 【问题描述】:我需要从已知集合中为每个类别选择一个顶行(有点类似于this question)。问题是,如何让这个查询在大量行上高效。
例如,让我们创建一个表来存储多个位置的温度记录。
CREATE TABLE #t (
placeId int,
ts datetime,
temp int,
PRIMARY KEY (ts, placeId)
)
-- insert some sample data
SET NOCOUNT ON
DECLARE @n int, @ts datetime
SELECT @n = 1000, @ts = '2000-01-01'
WHILE (@n>0) BEGIN
INSERT INTO #t VALUES (@n % 10, @ts, @n % 37)
IF (@n % 10 = 0) SET @ts = DATEADD(hour, 1, @ts)
SET @n = @n - 1
END
现在我需要获取位置 1、2、3 的最新记录。
这种方式很有效,但不能很好地扩展(而且看起来很脏)。
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
以下看起来更好,但工作效率要低得多(根据优化器,30% 对 70%)。
SELECT placeId, ts, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
问题是,在后一个查询执行计划中,对 #t 执行聚集索引扫描,检索、排序、编号和过滤 300 行,只剩下 3 行。对于前一个查询,3 次获取一行。
有没有办法在没有大量联合的情况下高效地执行查询?
【问题讨论】:
【参考方案1】:不要只看执行计划还要看statistics io
和statistics time
set statistics io on
go
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
SELECT placeId, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
set statistics io off
go
表“#t000000000B99”。扫描计数 3,逻辑读取 6,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。 表'#t000000000B99'。扫描计数 1,逻辑读取 6,物理读取 0,预读读取 0,lob 逻辑读取 0,lob 物理读取 0,lob 预读读取 0。
set statistics time on
go
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
SELECT placeId, temp FROM (
SELECT placeId, ts, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
set statistics time on
go
对我来说这两种方法没有真正的区别,加载更多数据并再次比较
此外,当您向两个查询添加 order by 时,它会下降到 40% 和 60%
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 1
ORDER BY ts DESC
) t1
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 2
ORDER BY ts DESC
) t2
UNION ALL
SELECT * FROM (
SELECT TOP 1 placeId, temp
FROM #t
WHERE placeId = 3
ORDER BY ts DESC
) t3
ORDER BY placeId
SELECT placeId, temp FROM (
SELECT placeId, temp, ROW_NUMBER() OVER (PARTITION BY placeId ORDER BY ts DESC) rownum
FROM #t
WHERE placeId IN (1, 2, 3)
) t
WHERE rownum = 1
ORDER BY placeId
【讨论】:
【参考方案2】:我加载了 100,000 行(这仍然不足以减慢速度),尝试了老式的方法:
select t.*
from #t t
inner join (select placeId, max(ts) ts
from #t
where placeId in (1,2,3)
group by placeId) xx
on xx.placeId = t.placeId
and xx.ts = t.ts
得到了几乎相同的结果。
然后我将索引中列的顺序颠倒,以
CREATE TABLE #t (
placeId int,
ts datetime,
temp int,
PRIMARY KEY (placeId, ts)
)
并且,在所有查询中,页面读取和索引搜索而不是扫描。
如果优化是您的目标并且您可以修改索引,我会修改主键,或者添加一个覆盖索引。
【讨论】:
谢谢,我不知何故错过了“老式方式”。它也适用于我的实际数据结构。【参考方案3】:仅作记录,使用 CROSS APPLY 的另一个选项。 在我的配置中,它的性能比之前提到的所有配置都要好。
SELECT *
FROM (VALUES (1),(2),(3)) t (placeId)
CROSS APPLY (
SELECT TOP 1 ts, temp
FROM #t
WHERE placeId = t.placeId
ORDER BY ts DESC
) tt
我猜,VALUES 可以更改为临时表或表变量,没有太大区别。
【讨论】:
以上是关于有效地为集合中的每个类别选择顶行的主要内容,如果未能解决你的问题,请参考以下文章
如何在应用程序呈现时从firestore获取集合并设置为vuex状态?