有效地处理具有覆盖的继承
Posted
技术标签:
【中文标题】有效地处理具有覆盖的继承【英文标题】:Handling inheritance with overriding efficiently 【发布时间】:2012-08-16 17:14:28 【问题描述】:我有以下两种数据结构。
首先,应用于对象三元组的属性列表:
Object1 Object2 Object3 Property Value
O1 O2 O3 P1 "abc"
O1 O2 O3 P2 "xyz"
O1 O3 O4 P1 "123"
O2 O4 O5 P1 "098"
二,继承树:
O1
O2
O4
O3
O5
或视为一种关系:
Object Parent
O2 O1
O4 O2
O3 O1
O5 O3
O1 null
O2 从 O1 继承属性的语义; O4 - 来自 O2 和 O1; O3 - 来自 O1;和 O5 - 来自 O3 和 O1,按照优先顺序。注意 1:我有一种有效的方法来选择给定对象的所有子对象或所有父对象。这目前是通过左右索引实现的,但 hierarchyid 也可以工作。现在这似乎并不重要。注意 2:我有确保“对象”列始终包含所有可能的对象的地方,即使它们实际上不必是那里(即没有定义父母或孩子)。这使得使用inner join
s 而不是效率大大降低的outer join
s 成为可能。
目标是:给定一对 (Property, Value),返回所有具有该属性的对象三元组,该属性的值要么是显式定义的,要么是从父级继承的。
注意 1:当 X = A
或 X is a parent of A
为真时,对象三元组 (X,Y,Z)
被视为三元组 (A,B,C)
的“父级”,对于(Y,B)
和 (Z,C)
。注意 2:在更近的父级上定义的属性会“覆盖”在更远的父级上定义的相同属性。注意 3:当 (A,B,C) 有两个父级 - (X1,Y1,Z1) 和 (X2,Y2,Z2) 时,则 (X1,Y1,Z1) 在以下情况下被视为“更接近”的父级:
(a) X2 是 X1 的父级,或
(b) X2 = X1 并且 Y2 是 Y1 的父级,或者
(c) X2 = X1 and Y2 = Y1 and Z2 是 Z1 的父节点
换句话说,三元组的“亲缘关系”首先基于三元组的第一个组件,然后是第二个组件,然后是第三个组件。 该规则根据祖先为三元组建立了明确的偏序。
例如,给定 (P1, "abc") 对,三元组的结果集将是:
O1, O2, O3 -- Defined explicitly
O1, O2, O5 -- Because O5 inherits from O3
O1, O4, O3 -- Because O4 inherits from O2
O1, O4, O5 -- Because O4 inherits from O2 and O5 inherits from O3
O2, O2, O3 -- Because O2 inherits from O1
O2, O2, O5 -- Because O2 inherits from O1 and O5 inherits from O3
O2, O4, O3 -- Because O2 inherits from O1 and O4 inherits from O2
O3, O2, O3 -- Because O3 inherits from O1
O3, O2, O5 -- Because O3 inherits from O1 and O5 inherits from O3
O3, O4, O3 -- Because O3 inherits from O1 and O4 inherits from O2
O3, O4, O5 -- Because O3 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
O4, O2, O3 -- Because O4 inherits from O1
O4, O2, O5 -- Because O4 inherits from O1 and O5 inherits from O3
O4, O4, O3 -- Because O4 inherits from O1 and O4 inherits from O2
O5, O2, O3 -- Because O5 inherits from O1
O5, O2, O5 -- Because O5 inherits from O1 and O5 inherits from O3
O5, O4, O3 -- Because O5 inherits from O1 and O4 inherits from O2
O5, O4, O5 -- Because O5 inherits from O1 and O4 inherits from O2 and O5 inherits from O3
请注意,此列表中没有三人组(O2、O4、O5)。这是因为属性 P1 是为三元组 (O2, O4, O5) 显式定义的,这会阻止该三元组从 (O1, O2, O3) 继承该属性。 另请注意,三人组(O4、O4、O5)也缺席。这是因为该三元组从 (O2, O4, O5) 继承了 P1="098" 的值,因为它比 (O1, O2, O3) 更接近父级。
直接的方法如下。 首先,对于定义属性的每个三元组,选择所有可能的子三元组:
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value
from TriplesAndProperties tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.O1
inner join Objects as Objects2 on Objects2.Id = tp.O2
inner join Objects as Objects3 on Objects3.Id = tp.O3
-- Then add all possible children of all those objects
inner join Objects as Children1 on Objects1.Id [isparentof] Children1.Id
inner join Objects as Children2 on Objects2.Id [isparentof] Children2.Id
inner join Objects as Children3 on Objects3.Id [isparentof] Children3.Id
但这还不是全部:如果某些三元组从多个父项继承相同的属性,则此查询将产生相互冲突的结果。 因此,第二步是只选择其中一个冲突的结果:
select * from
(
select
Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by Objects1.[depthInTheTree] descending, Objects2.[depthInTheTree] descending, Objects3.[depthInTheTree] descending
)
as InheritancePriority
from
... (see above)
)
where InheritancePriority = 1
窗口函数row_number() over( ... )
执行以下操作:对于对象三元组和属性的每个唯一组合,它按照从三元组到该值继承自的父项的祖先距离对所有值进行排序,然后我只选择结果值列表的第一个。
使用GROUP BY
和ORDER BY
语句可以实现类似的效果,但我只是发现窗口函数在语义上更清晰(它们产生的执行计划是相同的)。
关键是,我需要选择最近的贡献祖先,为此我需要分组,然后在组内排序。
最后,现在我可以简单地按属性和值过滤结果集。
这个方案有效。非常可靠和可预测。 事实证明,它对于执行的业务任务非常强大。
唯一的问题是,速度太慢了。 有人可能会指出七个表的连接可能会减慢速度,但这实际上并不是瓶颈。
根据我从 SQL Management Studio(以及 SQL Profiler)获得的实际执行计划,瓶颈是排序。
问题是,为了满足我的窗口功能,服务器必须按Children1.Id, Children2.Id, Children3.Id, tp.Property, Parents1.[depthInTheTree] descending, Parents2.[depthInTheTree] descending, Parents3.[depthInTheTree] descending
排序,并且不能使用任何索引,因为这些值来自多个表的交叉连接。
编辑: 根据 Michael Buen 的建议(谢谢你,Michael),我已将整个拼图发布到 sqlfiddle here。在执行计划中可以看到,Sort 操作占整个查询的 32%,并且会随着总行数的增加而增长,因为所有其他操作都使用索引。
通常在这种情况下我会使用索引视图,但在这种情况下不会,因为索引视图不能包含自连接,其中有六个。
到目前为止,我能想到的唯一方法是创建 Objects 表的六个副本,然后将它们用于连接,从而启用索引视图。 是时候让我沦为那种黑客了吗?绝望开始了。
【问题讨论】:
我认为您的关系表缺少Object=03
、Parent=01
的条目。
@stakx:你是对的,是的。解决了这个问题。
我还遇到了混合row_number
ing 和递归cte 的瓶颈。在对它们进行递归查询之前,尝试将窗口例程(例如 row_number)的结果具体化为实际表(临时)。此处示例:ienablemuch.com/2012/05/…
尝试在sqlfiddle.com 中发布您的查询(和一些数据),这样其他stackoverfellow 应该有一些东西可以作为他们制定的查询的基础或基准
只是一个简单的问题。这是一个真正的问题还是只是理论上的问题?我可能很原始地了解现实世界中需要这样的东西。特别是作为数据库中的查询。我可能会通过退后一步来尝试使问题更容易,这就是我这样问的原因。然后我还会查看数据量并问自己是否可以将其加载到内存中并以这种方式使其快速运行。顺便说一句,慢有多慢,为什么需要更快?
【参考方案1】:
我有 3 个可能的答案。
你的问题的 sql fiddle 在这里:http://sqlfiddle.com/#!3/7c7a0/3/0
我的答案的 sql 小提琴在这里:http://sqlfiddle.com/#!3/5d257/1
警告:
-
查询分析器还不够 - 我注意到许多答案被拒绝,因为他们的查询计划比原始查询更昂贵。分析仪只是指南。根据实际数据集、硬件和用例,更昂贵的查询可以比更便宜的查询更快地返回结果。您必须在您的环境中进行测试。
查询分析器无效 - 即使您找到了从查询中删除“最昂贵的步骤”的方法,它通常对您的查询没有任何影响。
单独的查询更改很少能缓解架构/设计问题 - 一些答案被拒绝,因为它们涉及架构级别的更改,例如触发器和附加表。拒绝优化的复杂查询强烈表明问题出在底层设计或我的期望上。您可能不喜欢它,但您可能不得不接受该问题在查询级别无法解决。
索引视图不能包含 row_number()/partitition 子句 - 通过创建对象表的六个副本来解决自联接问题不足以让您创建建议的索引视图。我在this sqlfiddle 中尝试过。如果您取消注释最后一个“创建索引”语句,您将收到错误消息,因为您的视图“包含排名或聚合窗口函数”。
工作答案:
-
左连接而不是 row_number() - 您可以使用使用左连接的查询来排除在树中被覆盖的结果。从这个查询中删除最后的“order by”实际上删除了一直困扰你的排序!此查询的执行计划仍然比您原来的更昂贵,但请参阅上面的免责声明 #1。
部分查询的索引视图 - 使用一些严肃的查询魔法(基于this technique),我为部分查询创建了一个索引视图。此视图可用于增强原始问题查询或答案#1。
实现为索引良好的表 - 其他人提出了这个答案,但他们可能没有很好地解释它。除非您的结果集非常大或者您对源表进行非常频繁的更新,否则实现查询结果并使用触发器使它们保持最新是解决此类问题的完美方法。为查询创建视图后,测试此选项就很容易了。您可以重复使用答案 #2 来加快触发速度,然后随着时间的推移进一步改进它。 (您正在谈论创建表的 六个 个副本,请先尝试此操作。它可以保证您关心的选择的性能将尽可能好。)
这是我从 sqlfiddle 获得的答案的架构部分:
Create Table Objects
(
Id int not null identity primary key,
LeftIndex int not null default 0,
RightIndex int not null default 0
)
alter table Objects add ParentId int null references Objects
CREATE TABLE TP
(
Object1 int not null references Objects,
Object2 int not null references Objects,
Object3 int not null references Objects,
Property varchar(20) not null,
Value varchar(50) not null
)
insert into Objects(LeftIndex, RightIndex) values(1, 10)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 2, 5)
insert into Objects(ParentId, LeftIndex, RightIndex) values(1, 6, 9)
insert into Objects(ParentId, LeftIndex, RightIndex) values(2, 3, 4)
insert into Objects(ParentId, LeftIndex, RightIndex) values(3, 7, 8)
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P1', 'abc')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,2,3, 'P2', 'xyz')
insert into TP(Object1, Object2, Object3, Property, Value) values(1,3,4, 'P1', '123')
insert into TP(Object1, Object2, Object3, Property, Value) values(2,4,5, 'P1', '098')
create index ix_LeftIndex on Objects(LeftIndex)
create index ix_RightIndex on Objects(RightIndex)
create index ix_Objects on TP(Property, Value, Object1, Object2, Object3)
create index ix_Prop on TP(Property)
GO
---------- QUESTION ADDITIONAL SCHEMA --------
CREATE VIEW TPResultView AS
Select O1, O2, O3, Property, Value
FROM
(
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by Objects1.LeftIndex desc, Objects2.LeftIndex desc, Objects3.LeftIndex desc
)
as Idx
from tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.Object1
inner join Objects as Objects2 on Objects2.Id = tp.Object2
inner join Objects as Objects3 on Objects3.Id = tp.Object3
-- Then add all possible children of all those objects
inner join Objects as Children1 on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
inner join Objects as Children2 on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
inner join Objects as Children3 on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
) as x
WHERE idx = 1
GO
---------- ANSWER 1 SCHEMA --------
CREATE VIEW TPIntermediate AS
select tp.Property, tp.Value
, Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
, Objects1.LeftIndex as PL1, Objects2.LeftIndex as PL2, Objects3.LeftIndex as PL3
, Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3
from tp
-- Select corresponding objects of the triple
inner join Objects as Objects1 on Objects1.Id = tp.Object1
inner join Objects as Objects2 on Objects2.Id = tp.Object2
inner join Objects as Objects3 on Objects3.Id = tp.Object3
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between Objects1.LeftIndex and Objects1.RightIndex
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between Objects2.LeftIndex and Objects2.RightIndex
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between Objects3.LeftIndex and Objects3.RightIndex
GO
---------- ANSWER 2 SCHEMA --------
-- Partial calculation using an indexed view
-- Circumvented the self-join limitation using a black magic technique, based on
-- http://jmkehayias.blogspot.com/2008/12/creating-indexed-view-with-self-join.html
CREATE TABLE dbo.multiplier (i INT PRIMARY KEY)
INSERT INTO dbo.multiplier VALUES (1)
INSERT INTO dbo.multiplier VALUES (2)
INSERT INTO dbo.multiplier VALUES (3)
GO
CREATE VIEW TPIndexed
WITH SCHEMABINDING
AS
SELECT tp.Object1, tp.object2, tp.object3, tp.property, tp.value,
SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL1,
SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL2,
SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.LeftIndex ELSE NULL END, 0)) as PL3,
SUM(ISNULL(CASE M.i WHEN 1 THEN Objects.RightIndex ELSE NULL END, 0)) as PR1,
SUM(ISNULL(CASE M.i WHEN 2 THEN Objects.RightIndex ELSE NULL END, 0)) as PR2,
SUM(ISNULL(CASE M.i WHEN 3 THEN Objects.RightIndex ELSE NULL END, 0)) as PR3,
COUNT_BIG(*) as ID
FROM dbo.tp
cross join dbo.multiplier M
inner join dbo.Objects
on (M.i = 1 AND Objects.Id = tp.Object1)
or (M.i = 2 AND Objects.Id = tp.Object2)
or (M.i = 3 AND Objects.Id = tp.Object3)
GROUP BY tp.Object1, tp.object2, tp.object3, tp.property, tp.value
GO
-- This index is mostly useless but required
create UNIQUE CLUSTERED index pk_TPIndexed on dbo.TPIndexed(property, value, object1, object2, object3)
-- Once we have the clustered index, we can create a nonclustered that actually addresses our needs
create NONCLUSTERED index ix_TPIndexed on dbo.TPIndexed(property, value, PL1, PL2, PL3, PR1, PR2, PR3)
GO
-- NOTE: this View is not indexed, but is uses the indexed view
CREATE VIEW TPIndexedResultView AS
Select O1, O2, O3, Property, Value
FROM
(
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by tp.Property, Children1.Id, Children2.Id, Children3.Id
order by tp.Property, Tp.PL1 desc, Tp.PL2 desc, Tp.PL3 desc
)
as Idx
from TPIndexed as TP WITH (NOEXPAND)
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
) as x
WHERE idx = 1
GO
-- NOTE: this View is not indexed, but is uses the indexed view
CREATE VIEW TPIndexedIntermediate AS
select tp.Property, tp.Value
, Children1.Id as O1, Children2.Id as O2, Children3.Id as O3
, PL1, PL2, PL3
, Children1.LeftIndex as CL1, Children2.LeftIndex as CL2, Children3.LeftIndex as CL3
from TPIndexed as TP WITH (NOEXPAND)
-- Then add all possible children of all those objects
inner join Objects as Children1 WITH (INDEX(ix_LeftIndex)) on Children1.LeftIndex between TP.PL1 and TP.PR1
inner join Objects as Children2 WITH (INDEX(ix_LeftIndex)) on Children2.LeftIndex between TP.PL2 and TP.PR2
inner join Objects as Children3 WITH (INDEX(ix_LeftIndex)) on Children3.LeftIndex between TP.PL3 and TP.PR3
GO
---------- ANSWER 3 SCHEMA --------
-- You're talking about making six copies of the TP table
-- If you're going to go that far, you might as well, go the trigger route
-- The performance profile is much the same - slower on insert, faster on read
-- And instead of still recalculating on every read, you'll be recalculating
-- only when the data changes.
CREATE TABLE TPResult
(
Object1 int not null references Objects,
Object2 int not null references Objects,
Object3 int not null references Objects,
Property varchar(20) not null,
Value varchar(50) not null
)
GO
create UNIQUE index ix_Result on TPResult(Property, Value, Object1, Object2, Object3)
--You'll have to imagine this trigger, sql fiddle doesn't want to do it
--CREATE TRIGGER tr_TP
--ON TP
-- FOR INSERT, UPDATE, DELETE
--AS
-- DELETE FROM TPResult
-- -- For this example we'll just insert into the table once
INSERT INTO TPResult
SELECT O1, O2, O3, Property, Value
FROM TPResultView
从 sqlfiddle 查询我的部分答案:
-------- QUESTION QUERY ----------
-- Original query, modified to use the view I added
SELECT O1, O2, O3, Property, Value
FROM TPResultView
WHERE property = 'P1' AND value = 'abc'
-- Your assertion is that this order by is the most expensive part.
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
-- ORDER BY O1, O2, O3
GO
-------- ANSWER 1 QUERY ----------
-- A different way to get the same result.
-- Query optimizer says this is more expensive, but I've seen cases where
-- it says a query is more expensive but it returns results faster.
SELECT O1, O2, O3, Property, Value
FROM (
SELECT A.O1, A.O2, A.O3, A.Property, A.Value
FROM TPIntermediate A
LEFT JOIN TPIntermediate B ON A.O1 = B.O1
AND A.O2 = B.O2
AND A.O3 = B.O3
AND A.Property = B.Property
AND
(
-- Find any rows with Parent LeftIndex triplet that is greater than this one
(A.PL1 < B.PL1
AND A.PL2 < B.PL2
AND A.PL3 < B.PL3)
OR
-- Find any rows with LeftIndex triplet that is greater than this one
(A.CL1 < B.CL1
AND A.CL2 < B.CL2
AND A.CL3 < B.CL3)
)
-- If this row has any rows that match the previous two cases, exclude it
WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value = 'abc'
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO
-------- ANSWER 2 QUERIES ----------
-- Same as above but using an indexed view to partially calculate results
SELECT O1, O2, O3, Property, Value
FROM TPIndexedResultView
WHERE property = 'P1' AND value = 'abc'
-- Your assertion is that this order by is the most expensive part.
-- Sometimes converting queries into views allows the server to
-- Optimize them better over time.
-- NOTE: removing this order by has no effect on this query.
--ORDER BY O1, O2, O3
GO
SELECT O1, O2, O3, Property, Value
FROM (
SELECT A.O1, A.O2, A.O3, A.Property, A.Value
FROM TPIndexedIntermediate A
LEFT JOIN TPIndexedIntermediate B ON A.O1 = B.O1
AND A.O2 = B.O2
AND A.O3 = B.O3
AND A.Property = B.Property
AND
(
-- Find any rows with Parent LeftIndex triplet that is greater than this one
(A.PL1 < B.PL1
AND A.PL2 < B.PL2
AND A.PL3 < B.PL3)
OR
-- Find any rows with LeftIndex triplet that is greater than this one
(A.CL1 < B.CL1
AND A.CL2 < B.CL2
AND A.CL3 < B.CL3)
)
-- If this row has any rows that match the previous two cases, exclude it
WHERE B.O1 IS NULL ) AS x
WHERE property = 'P1' AND value = 'abc'
-- NOTE: Removing this order _DOES_ reduce query cost removing the "sort" action
-- that has been the focus of your question.
-- Howeer, it wasn't clear from your question whether this order by was required.
--ORDER BY O1, O2, O3
GO
-------- ANSWER 3 QUERY ----------
-- Returning results from a pre-calculated table is fast and easy
-- Unless your are doing many more inserts than reads, or your result
-- set is very large, this is a fine way to compensate for a poor design
-- in one area of your database.
SELECT Object1 as O1, Object2 as O2, Object3 as O3, Property, Value
FROM TPResult
WHERE property = 'P1' AND value = 'abc'
ORDER BY O1, O2, O3
【讨论】:
不幸的是,一半的赏金只是为了点赞,但你的答案也不起作用。 “左连接而不是窗口函数”选项起初看起来像一个真正的赢家,我已经打开了一瓶酒......但是打开真实数据结果同样糟糕:实现连接的哈希匹配是 O( 2N),因为探测列上不能有任何索引,这比Nlog(N)好,但仍然令人望而却步。 本地优化,例如在查询时索引某些部分 - 当然我已经这样做了,而且在比你建议的更多的地方。可悲的是,这些只是减少了常数,对复杂性本身没有任何作用。 最后,“在每次更改时重建和索引整个结果集”方法。我在真实数据上进行了尝试,只是为了好玩。很容易理解,在我的开发机器上花费了几十秒的时间。 关于SQL管理工作室“不够”和“低效”(更正式地称为“先测量,然后优化”)的模糊哲学,我当然知道,当然我确保我使用的工具真正反映了事物的实际状态。关于“仅查询更改可能还不够”-您是否打算说我没有意识到这一点?我自己提议以六个额外表的形式更改架构,不是吗? 哇,最后用隐含的“无用”来谈论“谢谢”。正是像上面这样的评论链让我包含了警告。您似乎对任何人因帮助您而获得任何赞誉感到沮丧,因为他们并没有神奇地使您的查询速度提高几个数量级。对不起。当我说“架构/设计问题”并不意味着您似乎不愿意进行一些更改时,我的意思是您似乎不愿意考虑您的期望的真正问题,因为您对“继承具有属性的三元组”的要求设计 整体。【参考方案2】:您可以通过在索引表中实现连接来加快速度,比如joinedresult。这具有需要空间和保存到磁盘的缺点。但它的优点是能够对慢速部分使用索引。
insert into joinedresult
select Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,Objects1.[depthInTheTree] as O1D,Objects2.[depthInTheTree] as O2D,Objects3. depthInTheTree] as O3D from ... (see above)
确保joinedresult在[O1,O2,O3,Property,O1D,O2D,O3D]上有一个索引,并在运行前清除它。那么
select * from
(
select
Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by O1D descending, O2D descending, O3D descending
)
as InheritancePriority
from joinedresult
)
where InheritancePriority = 1
【讨论】:
row_number() 部分是问题。 首先,建立索引和排序一样慢,甚至更慢。其次,我需要将这整个东西作为一个表达式,而不是一个过程,以便它可以用作其他表达式的一部分。【参考方案3】:您是否尝试过索引(或设置 pk),首先是“Value”列,第二个是“Property”列,第三个是“Object1”列,第四个是“Object2”列,第五个是“Object3”列?我假设“价值”比“财产”更具限制性。
我还假设您将 Id 列设置为主键,并且 ParentId 和 Id 之间存在外键关系。
这个查询是如何执行的?:
with
-- First, get all combinations that match the property/value pair.
validTrip as (
select Object1, Object2, Object3
from TriplesAndProperties
where value = @value
and property = @property
),
-- Recursively flatten the inheritance hierarchy of Object1, 2 and 3.
o1 as (
select Id, 0 as InherLevel from Objects where Id in (select Object1 from validTrip)
union all
select rec.Id, InherLevel + 1 from Objects rec inner join o1 base on rec.Parent = base.[Object]
),
o2 as (
select Id, 0 as InherLevel from Objects where Id in (select Object2 from validTrip)
union all
select rec.Id, InherLevel + 1 from Objects rec inner join o2 base on rec.Parent = base.[Object]
),
o3 as (
select Id, 0 as InherLevel from Objects where Id in (select Object3 from validTrip)
union all
select rec.Id, InherLevel + 1 from Objects rec inner join o3 base on rec.Parent = base.[Object]
)
-- select the Id triple.
select o1.Id, o2.Id, o3.Id N
-- match every option in o1, with every option in o2, with every option in o3.
from o1
cross join o2
cross join o3
-- order by the inheritance level.
order by o1.InherLevel, o2.InherLevel, o3.InherLevel;
【讨论】:
是的,我已经尝试了很多键组合,包括这个特殊的组合。然而,现在对我来说很明显(基于执行计划),瓶颈在于排序,并且由于排序术语,任何表上的索引都不能帮助它。想想吧。 您的查询不起作用,原因有二。第一个原因:您的查询没有产生预期的结果。这是一个反例。假设 TriplesAndProperties 表包含两行:(1,2,3,P1,V1) 和 (1,4,5,P1,V1),并且 @property = 'P1' 和 @value = 'V1',并且没有对象有父对象。然后您的validTrip
cte 将包含两行:(1,2,3) 和 (1,4,5)。然后您的o1
cte 将有两行:(Id=1) 和 (Id=1)。您的o2
cte 将有两行:(Id=2) 和 (Id=4)。而您的o3
cte 将有两行:(Id=3) 和 (Id=5)。 InherLevel 在所有这些中都将为零。 (续)
那么您的三重交叉连接将产生以下八行(= 二乘二):(1,2,3), (1,2,5), (1,4,3 ), (1,4,5), (1,2,3), (1,2,5), (1,4,3), (1,4,5)。另一方面,所需的结果应该完全等于您的validTrip
cte,因为继承没有影响。换句话说,您正在生成所有可能的三元组,而我只需要那些来自validTrip
的“孩子”。
附带说明,我不明白按继承级别排序的目的是什么。我需要生成这些三元组,但我不在乎它们的顺序。
第二个原因是您的查询效率非常低,因为递归 CTE 在 SQL Server 中的工作方式:它没有魔法数学,服务器只是通过递归应用来盲目地评估整个表表达。当表达式是***的时,这完全没问题:服务器只是在结果可用时将结果返回给客户端。但是当您尝试连接结果时,这需要服务器对每个表达式进行整体评估,然后在执行连接之前将结果卸载到临时表中。比简单的排序要复杂一些。 :-)【参考方案4】:
Hierarchical queries,即WITH RECURSIVE ...
或CONNECT BY
等专有等价物在这种情况下是您的朋友。
解决您的特定问题的方法是:从离开开始并上升到根,聚合并排除已经找到的任何内容。
【讨论】:
我不明白这应该有什么帮助。您确定您已仔细阅读问题吗? 哎呀。好吧,现在我做到了。你想要相反的。难道你不能只创建一个结构(Obj、FromParent、Prop、Value)的持久平面视图(即由触发器维护的可索引表)来从一个简单的查询中得到答案吗? 然后我将不得不手动维护该表的更新,这将是错误的主要来源。在我的书中,这甚至比保留六份 objects 表的副本更可取。 您可以使用触发器来维护该表的更新。我已经成功地使用了这个策略,使用一个反映分层模型的平面表。在我的情况下,属性是权限。由于基于动态遍历层次结构的 SQL 视图的初始实现被证明是性能瓶颈,因此我决定编写基于表的“缓存和可索引视图”。由于所有读取查询现在都可以使用索引扫描,因此整体系统性能得到了提高。 感谢您提供有趣的链接,尽管由于所需的维护开销,在我的情况下这不是一个合适的设计选择。我对尽可能快的权限检查和良好的写入性能感到非常满意。使用六个表(或模型所需的任何数量的表)没有任何问题 - 我的观点是:将问题旋转 180 度,即在更改模型时执行继承以将结果缓存在表中并获取最快的读取访问。编写时的开销可能是可以忍受的,事情变得更加简单。【参考方案5】:我猜你的桌子相当大。因此缓慢。在那种情况下,我还猜测您有多个属性(2 到多个)。在这种情况下,我建议您在 CTE 中移动“where property='P1'”。这将过滤大部分数据,使您的查询速度与属性数量一样快。
类似:http://sqlfiddle.com/#!3/7c7a0/92/0
【讨论】:
如果您查看执行计划,您会发现排序仍占整个查询的 33%。发生这种情况是因为没有可用于排序的索引,因此服务器必须先计算整个中间集,然后再对其进行排序。此操作取决于被排序的行数,而不是线性排序,这转化为对象数量的三次方。换句话说,你没有解决问题:瓶颈仍然存在。 至于小的局部优化技巧,例如首先按属性过滤,- 我已经完成了所有这些以及更多。虽然它们确实在一定程度上降低了复杂性因素,但对对象数量的三次方的主要依赖性仍然存在,这会导致随着系统的增长而变得更加缓慢。【参考方案6】:缓存是加快查询速度的关键。它减少了您必须进行的计算。您想要创建索引,因为您想要CACHE,并保存WORK。以下是执行此操作的两种可能性。
选项 1
SQL 数据库因您的窗口函数而排序。你说窗口函数太慢了。
我不知道这会有多好,但它可能会奏效。
您可以尝试按单列排序 - “紧密度”,而不是按多列排序。
现在让我们将接近度定义为一些抽象整数。您可以使用以下 SQL 代替窗口函数:
select * from
(
select
Children1.Id as O1, Children2.Id as O2, Children3.Id as O3, tp.Property, tp.Value,
row_number() over(
partition by Children1.Id, Children2.Id, Children3.Id, tp.Property
order by closeness DESC
)
as InheritancePriority
from
... (see above)
)
where InheritancePriority = 1
closeness 可以是 TriplesAndProperties 表中定义的列。对于每个对象,您可以将其“接近度”定义为它与根节点 (O1) 的距离。然后我们可以定义closeness(tuple) = closeness(Object1)*100+closeness(Object2)*10+closeness(Object3)
这样,离根最远的元组就是你想要的。
为避免排序,您只需确保对接近度进行索引。
选项 2
我非常确定这会奏效。
定义您的 TriplesAndProperties 表以包含以下列:Object1, Object2, Object3, Property, Value, Effective_Object1, Effective_Object2, Effective_Object3, Closeness
。
请注意,这里我还将紧密度定义为列。
当您在表中插入/更新元组 (X,Y,Z) 时,您想要插入:
(X,Y,Z,Property,Value,X,Y,Z,0)
(X,Y,Z,Property,Value,X,Y,Z.child,1)
(X,Y,Z,Property,Value,X,Y,Z.grandchild,2)
(X,Y,Z,Property,Value,X,Y.child,Z,10)
(X,Y,Z,Property,Value,X,Y.child,Z.child,11)
(X,Y,Z,Property,Value,X,Y.child,Z.grandchild,12)
(X,Y,Z,Property,Value,X,Y.grandchild,Z,20)
(X,Y,Z,Property,Value,X,Y.grandchild,Z.child,21)
(X,Y,Z,Property,Value,X,Y.grandchild,Z.grandchild,22)
...
...
这意味着您将插入最多约 20 行,而不是在表中插入/更新/销毁单行。这还不算太糟糕。
那么您的查询非常简单。
你只是说:
SELECT * FROM
(
SELECT Effective_Object1, Effective_Object2, Effective_Object3, Property, Value,
row_number() over(
partition by Effective_Object1, Effective_Object2, Effective_Object3, Property
order by Closeness DESC
) AS InheritancePriority FROM TriplesAndProperties
) WHERE InheritancePriority = 1;
在此选项中,您必须确保对紧密度进行索引,您可以只按元组(Effective_Object1、Effective_Object2、Effective_Object3、Property、Closeness)进行索引。
在这两种情况下,您都有一定数量的缓存,即不添加任何额外信息的数据,但会缓存一定数量的计算或工作。
【讨论】:
以上是关于有效地处理具有覆盖的继承的主要内容,如果未能解决你的问题,请参考以下文章