用基于优先级的集合来填补空白
Posted
技术标签:
【中文标题】用基于优先级的集合来填补空白【英文标题】:Fill gaps with set based on priority 【发布时间】:2021-02-02 15:29:13 【问题描述】:目标
对于每个Foo
,我们应该尽可能使用FooBar
的记录填充时间范围。当FooBar
中不存在任何记录时,可以有空的时间范围。
FooBar
中的记录表现为一个集合。
这意味着如果FooId
和时间范围(完全)相同,则 BarId 在此时间范围内都有效。
应根据集合的优先级来填补空白。
表结构
CREATE TABLE Foo(
FooId INT NOT NULL,
ValidFrom DATETIME NOT NULL,
ValidUntil DATETIME NOT NULL,
)
CREATE TABLE FooBar (
FooId INT NOT NULL,
BarId INT NOT NULL,
ValidFrom DATETIME NOT NULL,
ValidUntil DATETIME NOT NULL,
Priority TINYINT NOT NULL,
)
样本数据
INSERT INTO Foo(FooId, ValidFrom, ValidUntil)
VALUES
(1, '2020-01-01', '2021-12-31')
, (2, '2020-01-01', '2021-06-30')
INSERT INTO FooBar(FooId, BarId, ValidFrom, ValidUntil, Priority)
VALUES
-- First set for FooId = 1 with prio 1
(1, 1, '2021-01-01', '2021-03-01', 1)
, (1, 2, '2021-01-01', '2021-03-01', 1)
, (1, 3, '2021-01-01', '2021-03-01', 1)
-- Second set for FooId = 1 with prio 2
, (1, 1, '2021-02-01', '2021-06-01', 2)
, (1, 2, '2021-02-01', '2021-06-01', 2)
-- Third set for FooId = 1 with prio 3
, (1, 1, '2021-01-01', '2021-12-31', 3)
, (1, 2, '2021-01-01', '2021-12-31', 3)
, (1, 3, '2021-01-01', '2021-12-31', 3)
-- Fourth set for FooId = 1 with Prio 1
, (1, 4, '2021-04-01', '2021-04-02', 1)
, (1, 5, '2021-04-01', '2021-04-02', 1)
-- First set for FooId = 2 with prio 3
, (2, 6, '2021-01-01', '2021-04-02', 3)
预期结果 需要澄清的来源列,不应成为生成结果集的一部分
FooId | BarId | ValidFrom | ValidUntil | Origin |
---|---|---|---|---|
1 | 1 | 2021-01-01 | 2021-03-01 | First set |
1 | 2 | 2021-01-01 | 2021-03-01 | First set |
1 | 3 | 2021-01-01 | 2021-03-01 | First set |
1 | 1 | 2021-03-02 | 2021-03-31 | Second set |
1 | 2 | 2021-03-02 | 2021-03-31 | Second set |
1 | 4 | 2021-04-01 | 2021-04-02 | Fourth set |
1 | 5 | 2021-04-01 | 2021-04-02 | Fourth set |
1 | 1 | 2021-04-03 | 2021-06-01 | Second set |
1 | 2 | 2021-04-03 | 2021-06-01 | Second set |
1 | 1 | 2021-06-02 | 2021-12-31 | Third set |
1 | 2 | 2021-06-02 | 2021-12-31 | Third set |
1 | 3 | 2021-06-02 | 2021-12-31 | Third set |
2 | 6 | 2021-01-01 | 2021-12-31 | First set (FooId = 2) |
我知道这可以通过光标或 while 循环实现,但我正在寻找更高效/更优雅的解决方案。
兼容级别为:130
【问题讨论】:
我投票结束这个问题,因为选择的代码没有回答问题,并且输入被修改以添加一个不在原始问题中的列。 【参考方案1】:当 FooBar 中不存在记录时,可以有空的时间范围。
这是否意味着没有空帧的解决方案也可以接受?
如果是这样,那么 FooId = 1
的第三组也为句点 2021-03-02 -> 2021-03-31
定义了一个 BarId = 3
。
样本数据
稍微调整数据模型以使结果中没有这些时间戳 (00:00:00.000
)。
还添加了一个集合标识符 (FooBar.SetId
) 以便于溯源。
CREATE TABLE Foo(
FooId INT NOT NULL,
ValidFrom DATE/*TIME*/ NOT NULL,
ValidUntil DATE/*TIME*/ NOT NULL
)
CREATE TABLE FooBar (
FooId INT NOT NULL,
BarId INT NOT NULL,
ValidFrom DATE/*TIME*/ NOT NULL,
ValidUntil DATE/*TIME*/ NOT NULL,
Priority TINYINT NOT NULL,
SetId nvarchar(5)
)
INSERT INTO Foo(FooId, ValidFrom, ValidUntil)
VALUES
(1, '2020-01-01', '2021-12-31')
, (2, '2020-01-01', '2021-06-30')
INSERT INTO FooBar(FooId, BarId, ValidFrom, ValidUntil, Priority, SetId)
VALUES
-- First set for FooId = 1 with prio 1
(1, 1, '2021-01-01', '2021-03-01', 1, 'Set 1')
, (1, 2, '2021-01-01', '2021-03-01', 1, 'Set 1')
, (1, 3, '2021-01-01', '2021-03-01', 1, 'Set 1')
-- Second set for FooId = 1 with prio 2
, (1, 1, '2021-02-01', '2021-06-01', 2, 'Set 2')
, (1, 2, '2021-02-01', '2021-06-01', 2, 'Set 2')
-- Third set for FooId = 1 with prio 3
, (1, 1, '2021-01-01', '2021-12-31', 3, 'Set 3')
, (1, 2, '2021-01-01', '2021-12-31', 3, 'Set 3')
, (1, 3, '2021-01-01', '2021-12-31', 3, 'Set 3')
-- Fourth set for FooId = 1 with Prio 1
, (1, 4, '2021-04-01', '2021-04-02', 1, 'Set 4')
, (1, 5, '2021-04-01', '2021-04-02', 1, 'Set 4')
-- First set for FooId = 2 with prio 3
, (2, 6, '2021-01-01', '2021-04-02', 3, 'Set 1')
解决方案
-
公用表表达式 (CTE)
ValidFrom
和 ValidPeriod
在最小的单个周期中从 Foo
和 FooBar
中剪切所有周期信息。
上一步还会为使用 exists
子句删除的每个 FooId
生成一个额外的尾随不完整句点。
然后为每个单独的周期获取具有第一优先级值的 FooBar
记录(也就是说,不允许具有更低优先级的类似记录:not exists ... fb2.Priority < fb.Priority
)。
这给出了:
with ValidFrom as
(
select f.FooId,
f.ValidFrom
from Foo f
union
select f.FooId,
dateadd(day, 1, f.ValidUntil)
from Foo f
union
select fb.FooId,
fb.ValidFrom
from FooBar fb
union
select fb.FooId,
dateadd(day, 1, fb.ValidUntil)
from Foobar fb
),
ValidPeriod as
(
select vf.FooId,
vf.ValidFrom,
dateadd(day, -1, lead(vf.ValidFrom) over(partition by vf.FooId order by vf.ValidFrom)) as ValidUntil
from ValidFrom vf
)
select vp.FooId,
fb.BarId,
vp.ValidFrom,
vp.ValidUntil,
--fb.ValidFrom,
--fb.ValidUntil,
--fb.Priority,
fb.SetId
from ValidPeriod vp
left join FooBar fb
on fb.FooId = vp.FooId
and fb.ValidFrom <= vp.ValidUntil
and fb.ValidUntil >= vp.ValidFrom
and not exists ( select 'x'
from FooBar fb2
where fb2.FooId = fb.FooId
and fb2.BarId = fb.BarId
and fb2.ValidFrom <= vp.ValidUntil
and fb2.ValidUntil >= vp.ValidFrom
and fb2.Priority < fb.Priority )
where exists ( select 'x'
from ValidPeriod vp2
where vp2.FooId = vp.FooId
and vp2.ValidFrom > vp.ValidFrom )
order by vp.FooId,
vp.ValidFrom,
fb.BarId;
结果
此结果包含的期间信息比您在预期结果中所要求的要多。从第一个 CTE 中删除 union
和 Foo
将删除 null
值并将周期限制为仅在 FooBar
中可用的周期信息(实际上这将完全从解决方案中消除 Foo
) .
以vp.ValidFrom
和vp.ValidUntil
作为结果周期:
FooId BarId ValidFrom ValidUntil SetId
----- ----- ---------- ---------- -----
1 null 2020-01-01 2020-12-31 null -- extra row
1 1 2021-01-01 2021-01-31 Set 1
1 2 2021-01-01 2021-01-31 Set 1
1 3 2021-01-01 2021-01-31 Set 1
1 1 2021-02-01 2021-03-01 Set 1 -- extra row
1 2 2021-02-01 2021-03-01 Set 1 -- extra row
1 3 2021-02-01 2021-03-01 Set 1 -- extra row
1 1 2021-03-02 2021-03-31 Set 2
1 2 2021-03-02 2021-03-31 Set 2
1 3 2021-03-02 2021-03-31 Set 3 -- extra row
1 1 2021-04-01 2021-04-02 Set 2 -- extra row
1 2 2021-04-01 2021-04-02 Set 2 -- extra row
1 3 2021-04-01 2021-04-02 Set 3 -- extra row
1 4 2021-04-01 2021-04-02 Set 4
1 5 2021-04-01 2021-04-02 Set 4
1 1 2021-04-03 2021-06-01 Set 2
1 2 2021-04-03 2021-06-01 Set 2
1 3 2021-04-03 2021-06-01 Set 3 -- extra row
1 1 2021-06-02 2021-12-31 Set 3
1 2 2021-06-02 2021-12-31 Set 3
1 3 2021-06-02 2021-12-31 Set 3
2 null 2020-01-01 2020-12-31 null -- extra row
2 6 2021-01-01 2021-04-02 Set 1
2 null 2021-04-03 2021-06-30 null -- extra row
以fb.ValidFrom
和fb.ValidUntil
作为结果周期:
FooId BarId ValidFrom ValidUntil SetId
----- ----- ---------- ---------- -----
1 null null null null -- extra row
1 1 2021-01-01 2021-03-01 Set 1
1 2 2021-01-01 2021-03-01 Set 1
1 3 2021-01-01 2021-03-01 Set 1
1 1 2021-01-01 2021-03-01 Set 1 -- extra row
1 2 2021-01-01 2021-03-01 Set 1 -- extra row
1 3 2021-01-01 2021-03-01 Set 1 -- extra row
1 1 2021-02-01 2021-06-01 Set 2
1 2 2021-02-01 2021-06-01 Set 2
1 3 2021-01-01 2021-12-31 Set 3 -- extra row
1 1 2021-02-01 2021-06-01 Set 2 -- extra row
1 2 2021-02-01 2021-06-01 Set 2 -- extra row
1 3 2021-01-01 2021-12-31 Set 3 -- extra row
1 4 2021-04-01 2021-04-02 Set 4
1 5 2021-04-01 2021-04-02 Set 4
1 1 2021-02-01 2021-06-01 Set 2
1 2 2021-02-01 2021-06-01 Set 2
1 3 2021-01-01 2021-12-31 Set 3 -- extra row
1 1 2021-01-01 2021-12-31 Set 3
1 2 2021-01-01 2021-12-31 Set 3
1 3 2021-01-01 2021-12-31 Set 3
2 null null null null -- extra row
2 6 2021-01-01 2021-04-02 Set 1
2 null null null null -- extra row
Fiddle 了解实际情况。
【讨论】:
这是一个非常好的起点,也是快速的解决方案。谢谢!以上是关于用基于优先级的集合来填补空白的主要内容,如果未能解决你的问题,请参考以下文章