计算值在分区上更改时的运行总计

Posted

技术标签:

【中文标题】计算值在分区上更改时的运行总计【英文标题】:Calculating a running total of when a value changes over a partition 【发布时间】:2020-06-18 17:05:07 【问题描述】:

我无法弄清楚如何编写一个窗口函数来解决我的问题。我是窗口函数的新手,但我认为可以编写一个来满足我的需要。

问题陈述: 我想计算一个转移序列,显示人们何时根据相应的位置 ID 随着时间的推移改变了位置。

样本数据(表 1)

+----------+------------+-----------+---------+
| PersonID | LocationID | Date      | Time    |
+----------+------------+-----------+---------+
| 12       | A          | 6/17/2020 | 12:00PM |
+----------+------------+-----------+---------+
| 12       | A          | 6/18/2020 | 1:00PM  |
+----------+------------+-----------+---------+
| 12       | B          | 6/18/2020 | 6:00AM  |
+----------+------------+-----------+---------+
| 12       | C          | 6/19/2020 | 3:00PM  |
+----------+------------+-----------+---------+
| 13       | A          | 6/16/2020 | 8:00AM  |
+----------+------------+-----------+---------+
| 13       | A          | 6/16/2020 | 11:00AM |
+----------+------------+-----------+---------+
| 13       | A          | 6/16/2020 | 12:00AM |
+----------+------------+-----------+---------+
| 13       | B          | 6/16/2020 | 4:00PM  |
+----------+------------+-----------+---------+

预期结果

+----------+------------+-----------+---------+-------------------+
| PersonID | LocationID | Date      | Time    | Transfer Sequence |
+----------+------------+-----------+---------+-------------------+
| 12       | A          | 6/17/2020 | 12:00PM | 1                 |
+----------+------------+-----------+---------+-------------------+
| 12       | A          | 6/18/2020 | 1:00PM  | 1                 |
+----------+------------+-----------+---------+-------------------+
| 12       | B          | 6/18/2020 | 6:00AM  | 2                 |
+----------+------------+-----------+---------+-------------------+
| 12       | C          | 6/19/2020 | 3:00PM  | 3                 |
+----------+------------+-----------+---------+-------------------+
| 13       | A          | 6/16/2020 | 8:00AM  | 1                 |
+----------+------------+-----------+---------+-------------------+
| 13       | A          | 6/16/2020 | 11:00AM | 1                 |
+----------+------------+-----------+---------+-------------------+
| 13       | A          | 6/16/2020 | 12:00AM | 1                 |
+----------+------------+-----------+---------+-------------------+
| 13       | B          | 6/16/2020 | 4:00PM  | 2                 |
+----------+------------+-----------+---------+-------------------+

我尝试了什么

SELECT 
     [t1].[PersonID]
    ,[t1].[LocationID]
    ,[t1].[Date]
    ,[t1].[Time]
    ,DENSE_RANK() 
         OVER( 
           partition BY [t1].[PersonID], [t1].[LocationID] 
           ORDER BY [t1].[Date] ASC, [t1].[Time] ASC) AS 
       [Transfer Sequence]


FROM Table1 [t1]

不幸的是,我相信 DENSE_RANK() 正在分配排名,而不管 LocationID 的值是否已更改。我需要一个仅在 LocationID 更改时将一个添加到序列中的函数。

任何帮助将不胜感激。

谢谢!

【问题讨论】:

【参考方案1】:

您希望将“相邻”行放在同一组中。直窗函数无法为您做到这一点 - 我们需要使用间隙和孤岛技术:

select 
    t.*, 
    sum(case when locationID = lagLocationID then 0 else 1 end) 
        over(partition by personID order by date, time) 
        as transfert_sequence
from (
    select 
        t.*, 
        lag(locationID) 
            over(partition by personID order by date, time) 
            as lagLocationID
    from mytable t
) t

这个想法是计算每次 locationID 变化时递增的窗口总和。

请注意,当一个人回到他们以前去过的位置时,这将正确处理这种情况。

【讨论】:

【参考方案2】:

我所做的(我确信这不是最好的方法)是创建第二个表,其中包含 PersonID、locationID、日期、时间和传输序列(序列)的空字段,然后是光标:

DECLARE transaction CURSOR
FOR select PersonID, LocationID, Date, Time from table1;

然后循环:

OPEN CURSOR transaction
set @count = 0
set @person_saved = ""
set @location_saed = ""
FETCH NEXT FROM transaction INTO @person, @location, @date, @time

WHILE @@FETCH_STATUS = 0  
BEGIN
if @person_saved <> @person  -- changing personID, reset count
begin
set count = 0
set persone_saved = @person
end
if @location_saved <> @location. -- changing location, add count
begin
set @count = @count + 1
set @location_saved = @location
end
update table1 set sequence = @count where PersonId = @person and locationId = @location and date = @date and time = @time

FETCH NEXT FROM transaction INTO @person, @location, @date, @time
END

CLOSE transaction
DEALLOCATE transaction

【讨论】:

游标方法比使用子查询解决这类问题更有效吗? 性能的时间取决于表有多少行,我通常使用游标的加号正在调试...带有游标的存储过程比SQL更容易调试。当然,如果来自@GMB 的 SQL 工作正常,那么编写起来就少了很多 :)

以上是关于计算值在分区上更改时的运行总计的主要内容,如果未能解决你的问题,请参考以下文章

每个增量的运行总计

具有运行总计的 SQL 分区

在 SQL Server 中的计算列上运行总计

计算运行总计/运行余额

Oracle SQL 在字段更改时运行总计(仅在字段更改时对列求和)

Index.html 脚本标记值在运行时被替换