创建由多列和连续日期分区的序列

Posted

技术标签:

【中文标题】创建由多列和连续日期分区的序列【英文标题】:Create sequence partitioned by multiple columns and contiguous date 【发布时间】:2015-07-23 07:19:49 【问题描述】:

我试图弄清楚如何创建由多个列分区的序列,其中序列必须在另一个(基于日期的)列不连续时重置。

问题:医院 ADT(入院/出院/转院)事件发生在特定时间点,但我们希望将这些事件转换为具有持续时间(时间跨度)的活动,即我们有开始日期,但没有t 有结束日期,该日期基于下一个适当的 ADT 事件。我们已经在代码中做到了这一点,但也希望在 SQL 中做到这一点以提高性能。例如查找在 ICU 停留超过 48 小时的患者。

我们要记录六个不同级别的站点位置:设施、护理点、建筑物、楼层、房间和床位。

例子:

Stream  Event  Started           Facility    PointOfCare  ...
1       1      2015-01-01 09:05  Hospital-A  ICU           
1       2      2015-01-02 13:10  Hospital-A  WARD-1
2       3      2015-02-10 12:00  Hospital-A  ICU           
2       4      2015-02-11 12:00  Hospital-A  ICU
2       5      2015-02-12 04:30  Hospital-A  WARD-2

因此,对于每个事件,我们都想知道它们在每个特定站点位置的时间。每个流中最后一个活动的结束日期为空(仍为住院患者)或患者出院日期。

这是我目前的解决方案:

-- Create a sequence for each site location
INSERT INTO ADT_Activity_Sequence
SELECT 
  [Stream], 
  [Event],
  [Started],
  [Facility], 
  ROW_NUMBER() OVER (PARTITION BY [Stream], 
    ISNULL([Facility], [Event]) 
    ORDER BY [Started]) AS [FacilitySequence], 
  [PointOfCare], 
  ROW_NUMBER() OVER (PARTITION BY [Stream], 
    ISNULL([Facility], [Event]), 
    ISNULL([PointOfCare], [Event]) 
    ORDER BY [Started]) AS [PointOfCareSequence]
  -- and so on for all site locations
FROM ADT_Event
INNER JOIN ADT_Stream ON ADT_Event.Stream = Stream.Id

例子:

Stream  Event  Started           Facility    FacilitySequence  PointOfCare  PointOfCareSequence ...
1       1      2015-01-01 09:05  Hospital-A  1                 ICU          1 
1       2      2015-01-02 13:10  Hospital-A  2                 WARD-1       1
2       3      2015-02-10 12:00  Hospital-A  1                 ICU          1
2       4      2015-02-11 12:00  Hospital-A  2                 ICU          2
2       5      2015-02-12 04:30  Hospital-A  3                 WARD-2       1

然后从序列中创建持续时间:

INSERT INTO ADT_Activity_Duration
SELECT 
    [Stream], 
    [Event],
    [Started],
    [Facility], 
    [Sequence].[FacilitySequence],
    (
        -- Find most recent activity which is the first in current sequence
        SELECT TOP 1 [FacilitySequence].[Started] 
        FROM [ADT_Activity_Sequence] [FacilitySequence]
        WHERE [FacilitySequence].[Stream] = [Event].[Stream] AND [FacilitySequence].[FacilitySequence] = 1 AND [FacilitySequence].[Started] <= [Event].[Started]
        ORDER BY [FacilitySequence].[Started] DESC
    ) AS [FacilityStarted],
    (
        -- Find first activity in next sequence as this activities end date
        -- Last activity returns null, so activity uses stream end date if set
        ISNULL((                
            SELECT TOP 1 [FacilitySequence].[Started]
            FROM [ADT_Activity_Sequence] [FacilitySequence]
            WHERE [FacilitySequence].[Stream] = [Event].[Stream] AND [FacilitySequence].[FacilitySequence] = 1 AND [FacilitySequence].[Started] > [Event].[Started]
            ORDER BY [FacilitySequence].[Started]), [Stream].[Ended])
    ) AS [FacilityEnded],
    [PointOfCare], 
    [Sequence].[PointOfCareSequence],
    (
        SELECT TOP 1 [PointOfCareSequence].[Started] 
        FROM [ADT_Activity_Sequence] [PointOfCareSequence]
        WHERE [PointOfCareSequence].[Stream] = [Event].[Stream] AND [PointOfCareSequence].[PointOfCareSequence] = 1 AND [PointOfCareSequence].[Started] <= [Event].[Started]
        ORDER BY [PointOfCareSequence].[Started] DESC
    ) AS [PointOfCareStarted],
    (
        ISNULL((
            SELECT TOP 1 [PointOfCareSequence].[Started]
            FROM [ADT_Activity_Sequence] [PointOfCareSequence]
            WHERE [PointOfCareSequence].[Stream] = [Event].[Stream] AND [PointOfCareSequence].[PointOfCareSequence] = 1 AND [PointOfCareSequence].[Started] > [Event].[Started]
            ORDER BY [PointOfCareSequence].[Started]), [Stream].[Ended])
    ) AS [PointOfCareEnded]
    -- and so on for all site locations
FROM ADT_Event AS [Event]
INNER JOIN [ADT_Stream] AS [Stream] ON [Event].[Stream] = [Stream].[Id]
INNER JOIN [ADT_Activity_Sequence] [Sequence] ON [Event].[Id] = [Sequence].[Event]

例子:

Stream  Event  Started           Facility    FacilitySequence  FacilityStarted  FacilityEnded     PointOfCare  PointOfCareSequence  PointOfCareStarted  PointOfCareEnded  ...
1       1      2015-01-01 09:05  Hospital-A  1                 2015-01-01 09:05 2015-01-03 12:00  ICU          1                    2015-01-01 09:05    2015-01-02 13:10  
1       2      2015-01-02 13:10  Hospital-A  2                 2015-01-01 09:05 2015-01-03 12:00  WARD-1       1                    2015-01-02 13:10    2015-01-03 12:00
2       3      2015-02-10 12:00  Hospital-A  1                 2015-02-10 12:00 <NULL>            ICU          1                    2015-02-10 12:00    2015-02-12 04:30
2       4      2015-02-11 12:00  Hospital-A  2                 2015-02-10 12:00 <NULL>            ICU          2                    2015-02-10 12:00    2015-02-12 04:30
2       5      2015-02-12 04:30  Hospital-A  3                 2015-02-10 12:00 <NULL>            WARD-2       1                    2015-02-12 04:30    <NULL>

我的问题是连续的日期序列被破坏,当患者从任何站点位置转移,然后再次转移回来时,都会发生这种情况,所有这些都在同一个流中:

Stream  Event  Started           Facility    PointOfCare  ...
3       1      2015-03-01 09:05  Hospital-A  ICU           
3       2      2015-03-02 13:10  Hospital-A  WARD-1
3       3      2015-03-02 10:00  Hospital-A  ICU           

例子:

Stream  Event  Started           Facility    FacilitySequence  PointOfCare  PointOfCareSequence ...
3       1      2015-03-01 09:05  Hospital-A  1                 ICU          1 
3       2      2015-03-02 13:10  Hospital-A  2                 WARD-1       1
3       3      2015-03-02 10:00  Hospital-A  3                 ICU          2

注意事件 #3 的护理点序列为 2,这是不正确的,由于事件 #2 位于不同的位置,因此需要将其重置回 1。

我一直在兜圈子 :) 所以任何帮助表示赞赏,谢谢!

【问题讨论】:

您能否澄清一下所提供的表格中哪些是源数据,哪些是预期结果。另外,请添加带有 SQL Server 版本的标签。 我没有添加版本标签,因为它是 SQL-Server >= 2005,但现在已将其添加为基线。抱歉,如果不清楚,但 ADT_Event 和 ADT_Stream 是源表,ADT_Activitiy 表是从它们生成的。为简洁起见,我省略了一堆其他代码,但它主要用于处理要更新的事件和流。 好的,我会改写我的请求。您能否清楚地展示您的样本数据的外观以及基于该样本数据的最终结果应该是什么。现在您的问题中有 5 个表,其中一些有 Stream=3,有些没有。这令人困惑。关于版本:SQL Server 2012+ 具有 LEADLAG 之类的函数,可以极大地帮助计算持续时间。 没问题,我会用架构和一些测试数据创建一个 SQLFiddle。 ughai 打败了我 :) 【参考方案1】:

如果我正确理解您的问题,您需要连续的ROW_NUMBER()。您可以使用流 ROW_NUMBER() 和单个序列之间的行号差异来生成一个组,在该组上订购您的行号以用于设施和护理点。

由于这些不是使用FacilityPointofCare 直接分组,而是基于它们的顺序,如果患者再次切换回同一设施或护理点,则顺序会被重置。

使用类似的东西。 SQL Fiddle

;WITH CTE as 
(
    SELECT *,
    ROW_NUMBER() OVER(PARTITION BY Stream ORDER BY [Started]) as StreamSequence,
    ROW_NUMBER() OVER(PARTITION BY Stream ORDER BY [Started]) - ROW_NUMBER() OVER(PARTITION BY Facility ORDER BY [Started]) as FacilityGroup,
    ROW_NUMBER() OVER(PARTITION BY Stream ORDER BY [Started]) - ROW_NUMBER() OVER(PARTITION BY PointOfCare ORDER BY [Started]) as PointOfCareGroup
    FROM Stream
)
SELECT 
Stream, Event, Started, Facility, PointOfCare, StreamSequence,
ROW_NUMBER() OVER(PARTITION BY Stream,FacilityGroup ORDER BY [Started]) as FacilitySequence,
ROW_NUMBER() OVER(PARTITION BY Stream,PointOfCareGroup ORDER BY [Started]) as PointOfCareSequence
FROM CTE
ORDER BY Event;

您可以根据需要根据这些顺序生成日期范围。

输出

| Stream | Event |                    Started |   Facility | PointOfCare | StreamSequence | FacilitySequence | PointOfCareSequence |
|--------|-------|----------------------------|------------|-------------|----------------|------------------|---------------------|
|      1 |     1 |  January, 01 2015 09:05:00 | Hospital-A |         ICU |              1 |                1 |                   1 |
|      1 |     2 |  January, 02 2015 13:10:00 | Hospital-A |      WARD-1 |              2 |                2 |                   1 |
|      2 |     3 | February, 10 2015 12:00:00 | Hospital-A |         ICU |              1 |                1 |                   1 |
|      2 |     4 | February, 11 2015 12:00:00 | Hospital-A |         ICU |              2 |                2 |                   2 |
|      2 |     5 | February, 12 2015 04:30:00 | Hospital-A |      WARD-2 |              3 |                3 |                   1 |
|      2 |     6 | February, 12 2015 05:30:00 | Hospital-A |         ICU |              4 |                4 |                   1 |

【讨论】:

工作请客,谢谢!我唯一需要更改的是在每个站点位置的 CTE 分区上添加 ISNULL 检查,如果为 null,则使用活动 PK,因为位置可能并不总是设置,因此必须重置为 null。 FWIW,进一步的测试发现了另一个问题,我可能没有充分解释这一点,但是每个站点位置都有一个层次结构,所以我必须将 CTE 中的每个较低位置划分为较高的位置水平位置,因为它们可以共享。例如,ICU 的同一个密钥可能在两个不同的设施中,因此您必须按设施和护理点划分护理点位置,等等。

以上是关于创建由多列和连续日期分区的序列的主要内容,如果未能解决你的问题,请参考以下文章

Oracle对象下集(序列同义词分区表database link)

具有多列的聚类表

多列上的 Spark 动态分区覆盖产生空白输出

BigqueryIO 无法写入日期分区表

MySQL中的分区表

基于日期/时间字段的分区的分区到期倒计时