在 SQL Server 中删除时间戳列不同的单个表中的行

Posted

技术标签:

【中文标题】在 SQL Server 中删除时间戳列不同的单个表中的行【英文标题】:Delete rows in single table in SQL Server where timestamp column differs 【发布时间】:2017-05-13 03:47:11 【问题描述】:

我有一张员工打卡表,看起来像这样:

| EmployeeID | PunchDate  | PunchTime | PunchType | Sequence |
|------------|------------|-----------|-----------|----------|
|       5386 | 12/27/2016 | 03:57:42  | On Duty   |      552 |
|       5386 | 12/27/2016 | 09:30:00  | Off Duty  |      563 |
|       5386 | 12/27/2016 | 010:02:00 | On Duty   |      564 |
|       5386 | 12/27/2016 | 12:10:00  | Off Duty  |      570 |
|       5386 | 12/27/2016 | 12:22:00  | On Duty   |      571 |
|       5386 | 12/27/2016 | 05:13:32  | Off Duty  |      578 |  

我需要做的是删除下班打卡和下班 值班打卡之间的分钟差为不到 25 分钟。在上面的示例中,我想删除序列 570 和 571。

我已经通过从另一个表中拉出所有 Off Duty 拳并使用此查询拉出 Off 之后的所有 On Duty 拳来创建此表值班打卡:

SELECT *  FROM [dbo].[Punches]
INSERT INTO [dbo].[UpdatePunches (EmployeeID,PunchDate,PunchTime,PunchType,Sequence)
SELECT *  FROM [dbo].[Punches]
WHERE Sequence IN (
SELECT Sequence + 1
FROM [dbo].[Punches]
WHERE PunchType LIKE 'Off Duty%') AND
PunchType LIKE 'On Duty%'   

我一直在尝试在此代码中添加某种 DATEDIFF 查询,并将其作为一个单独的步骤来清除这些查询,但没有任何运气。我不能使用特定的序列号,因为每次出拳都会改变。

我使用的是 SQL Server 2008。

任何建议将不胜感激。

【问题讨论】:

我尝试过使用某种 DATEDIFF,但我的问题是我不知道如何最好地引用满足 DATEDIFF 条件的各个行。我将更新主要帖子以提供更多想法/示例。 为什么打孔日期和打孔时间在不同的列中?它们可以为空吗?如果是这样,如何处理日期或时间为空的记录?什么是序列号(除了已经提供订单的日期/时间)?如果一个“值班”后面紧跟着另一个“值班”怎么办? “下班”的问题相同。 @ThorstenKettner,打孔日期和打孔时间是分开的,因为那在源头上是分开的(我通过 WebServices 提取此信息)。它们不应该为空。如果我没记错的话,序列号是我添加的,这样我就可以使用显示的 SQL 查询在下班打卡之后直接拉出上班打卡。 SQL 查询应该处理连续的 On 或 Off Duty,但在这种情况下,我从中提取这些拳头的软件将不允许已经在值班的人再次以这种方式标记自己。他们可以进入不同的状态,但不一样。 【参考方案1】:

也许这样的事情很容易打进去。这只是使用子查询来查找下一个“上班”打卡,并将其在主查询中与“下班”打卡进行比较。

          Delete
          FROM [dbo].[Punches] p
          where p.PunchTime >=
          dateadd(minute, -25, isnull (
   (select top 1 p2.PunchTime from [dbo].[Punches] p2 where 
   p2.EmployeeID=p.EmployeeID and p2.PunchType='On Duty' 
   and p1.Sequence < p2.Sequence and p2.PunchDate=p.PunchDate
   order by p2.Sequence asc)
   ),'2500-01-01')
          and p.PunchType='Off Duty'

【讨论】:

【参考方案2】:

您可以根据打卡日期和打卡时间为每位员工分配行号,并根据日期和时间的升序将每一行与下一行连接起来。

之后,获取相差小于25分钟的那些行的行号,最后删除这些行。

with rownums as 
(select t.*,row_number() over(partition by employeeid 
                              order by cast(punchdate +' '+punchtime as datetime) ) as rn
 from t)
,rownums_to_delete as 
(
 select r1.rn,r1.employeeid
 from rownums r1
 join rownums r2 on r1.employeeid=r2.employeeid and r1.rn=r2.rn+1
 where dateadd(minute,25,cast(r2.punchdate +' '+r2.punchtime as datetime)) > cast(r1.punchdate +' '+r1.punchtime as datetime)
 and r1.punchtype <> r2.punchtype
 union all
 select r2.rn, r2.employeeid
 from rownums r1
 join rownums r2 on r1.employeeid=r2.employeeid and r1.rn=r2.rn+1
 where dateadd(minute,25,cast(r2.punchdate +' '+r2.punchtime as datetime)) > cast(r1.punchdate +' '+r1.punchtime as datetime)
 and r1.punchtype <> r2.punchtype
)
delete r
from rownums_to_delete rd
join rownums r on rd.employeeid=r.employeeid and r.rn=rd.rn

Sample Demo

如果日期和时间列不是varchar,而是实际的datetime 数据类型,请在查询中使用punchdate+punchtime

编辑:更简单的查询版本是

with todelete as (
select t1.employeeid,cast(t2.punchdate+' '+t2.punchtime as datetime) as punchtime,
t2.punchtype,t2.sequence,
cast(t1.punchdate+' '+t1.punchtime as datetime) next_punchtime, 
t1.punchtype as next_punchtype,t1.sequence as next_sequence
from t t1
join t t2 on t1.employeeid=t2.employeeid 
and cast(t2.punchdate+' '+t2.punchtime as datetime) between dateadd(minute,-25,cast(t1.punchdate+' '+t1.punchtime as datetime)) and cast(t1.punchdate+' '+t1.punchtime as datetime) 
where t2.punchtype <> t1.punchtype
    )
delete t
from t 
join todelete td on t.employeeid = td.employeeid 
and cast(t.punchdate+' '+t.punchtime as datetime) in (td.punchtime,td.next_punchtime)
;

【讨论】:

好吧,你的第一个例子成功了。我会看看我是否也可以改编第二个,但无论哪种方式:谢谢!【参考方案3】:

SQL Server 有一个很好的能力,称为可更新 CTE。使用lead()lag(),你可以做你想做的事。以下假设日期实际存储为datetime——这只是为了方便将日期和时间相加(也可以显式使用转换):

with todelete as (
      select tcp.*,
             (punchdate + punchtime) as punchdatetime.
             lead(punchtype) over (partition by employeeid order by punchdate, punchtime) as next_punchtype,
             lag(punchtype) over (partition by employeeid order by punchdate, punchtime) as prev_punchtype,
             lead(punchdate + punchtime) over (partition by employeeid order by punchdate, punchtime) as next_punchdatetime,
             lag(punchdate + punchtime) over (partition by employeeid order by punchdate, punchtime) as prev_punchdatetime
      from timeclockpunches tcp
     )
delete from todelete
    where (punchtype = 'Off Duty' and
           next_punchtype = 'On Duty' and
           punchdatetime > dateadd(minute, -25, next_punchdatetime)
          ) or
          (punchtype = 'On Duty' and
           prev_punchtype = 'Off Duty' and
           prev_punchdatetime > dateadd(minute, -25, punchdatetime)
          );

编辑:

在 SQL Server 2008 中,您可以使用相同的想法,只是效率不高:

delete t
    from t outer apply
         (select top 1 tprev.*
          from t tprev
          where tprev.employeeid = t.employeeid and
                (tprev.punchdate < t.punchdate or
                 (tprev.punchdate = t.punchdate and tprev.punchtime < t.punchtime)
                )
          order by tprev.punchdate desc, tprev.punchtime desc
         ) tprev outer apply
         (select top 1 tnext.*
          from t tnext
          where tnext.employeeid = t.employeeid and
                (t.punchdate < tnext.punchdate or
                 (t.punchdate = tnext.punchdate and t.punchtime < tnext.punchtime)
                )
          order by tnext.punchdate desc, tnext.punchtime desc
         ) tnext
where (t.punchtype = 'Off Duty' and
       tnext.punchtype = 'On Duty' and
       t.punchdatetime > dateadd(minute, -25, tnext.punchdatetime)
      ) or
      (t.punchtype = 'On Duty' and
       tprev.punchtype = 'Off Duty' and
       tprev.punchdatetime > dateadd(minute, -25, t.punchdatetime)
      );

【讨论】:

我非常喜欢这个,但我使用的是 SQL 2008,当我尝试它时,我收到一条关于未启用并行数据仓库功能的消息。我假设 2008 年没有领先()和滞后()...【参考方案4】:

您可以从 CTE 中的日期和时间字段创建日期时间,然后在下班时间之后查找下一个上班时间,如下所示:

;
WITH OnDutyDateTime AS
(
    SELECT 
    EmployeeID,
    Sequence,
    DutyDateTime = DATEADD(ms, DATEDIFF(ms, '00:00:00', PunchTime), CONVERT(DATETIME, PunchDate)) 
    FROM
    #TempEmployeeData 
    where PunchType = 'On Duty'
),
OffDutyDateTime As
(
    SELECT 
    EmployeeID,
    Sequence,
    DutyDateTime = DATEADD(ms, DATEDIFF(ms, '00:00:00', PunchTime), CONVERT(DATETIME, PunchDate)) 
    FROM
    #TempEmployeeData 
    where PunchType = 'Off Duty'
)

SELECT 
    OffDutyDateTime = DutyDateTime,
    OnDutyDateTime = (SELECT TOP 1 DutyDateTime FROM OnDutyDateTime WHERE EmployeeID = A.EmployeeID AND Sequence > A.Sequence ORDER BY Sequence ASC ),
    DiffInMinutes = DATEDIFF(minute,DutyDateTime,(SELECT TOP 1 DutyDateTime FROM OnDutyDateTime WHERE EmployeeID = A.EmployeeID AND Sequence > A.Sequence ORDER BY Sequence ASC ))

FROM
OffDutyDateTime A


OffDutyDateTime         OnDutyDateTime          DiffInMinutes
----------------------- ----------------------- -------------
2016-12-27 09:30:00.000 2016-12-27 10:02:00.000 32
2016-12-27 12:10:00.000 2016-12-27 12:22:00.000 12
2016-12-28 05:13:32.000 NULL                    NULL

(受影响的 3 行)

【讨论】:

以上是关于在 SQL Server 中删除时间戳列不同的单个表中的行的主要内容,如果未能解决你的问题,请参考以下文章

如何将 SQL Server 的时间戳列转换为日期时间格式

SQL (RedShift):从时间戳列中为每个月选择不同的日期

如何在单个进程中模拟 SQL Server 中的死锁?

SQL 查询:如何在排名列值中使用时间戳列

删除 SQL Server 中不同级别的空 XML 标记

通过在 ms-sql 中的表中添加一个名为 recordversion 的时间戳列,我可以获得啥?