尝试删除 SQL Server 中的重复行,其中差异是日期或批号

Posted

技术标签:

【中文标题】尝试删除 SQL Server 中的重复行,其中差异是日期或批号【英文标题】:Trying to delete duplicate rows in SQL Server where the difference is the date or batch number 【发布时间】:2021-12-25 07:22:26 【问题描述】:

我有这个问题:

SELECT 
    T1.ID_NUMBER,                                                                    
    T1.INCEPTION_DATE,
    T1.OCCURRENCE,
    T1.TRANSACTION_DATE,
    T1.FILE_LOAD_DATE,
    T1.BATCH_NUM
FROM 
    mastertable T1
INNER JOIN 
    (SELECT 
         ID_NUMBER, INCEPTION_DATE, OCCURRENCE, 
         COUNT(*) AS DUPL_COUNT
     FROM 
         mastertable
     WHERE 
         SOURCE_SYSTEM ='LEGACY'
     GROUP BY 
         ID_NUMBER, INCEPTION_DATE, OCCURRENCE
     HAVING 
         COUNT(*) > 1) t2 ON T2.ID_NUMBER = T1.ID_NUMBER 
                          AND T2.INCEPTION_DATE = T1.INCEPTION_DATE 
                          AND T2.OCCURRENCE= T1.OCCURRENCE
ORDER BY 
    1, 2, 3, 4, 5

返回以下结果

ID_NUMBER INCEPTION_DATE OCCURRENCE TRANSACTION_DATE FILE_LOAD_DATE BATCH_NUM
112897732 2008-09-15 4 2008-07-03 2008-07-07 17:57:19 06341
112897732 2008-09-15 4 2008-07-13 2008-07-18 03:35:55 06753
828194721 2008-11-11 1 2008-09-06 2008-09-17 02:50:44 97334
828194721 2008-11-11 1 2008-09-23 2008-09-24 02:55:27 98331
456457422 2008-09-28 1 2008-12-03 2008-07-13 08:08:39 00734
456457422 2008-09-28 1 2008-12-03 2008-07-18 13:35:55 00991
999272910 2008-05-07 3 2008-05-03 2008-10-13 08:08:38 11432
999272910 2008-05-07 3 2008-05-28 2008-10-18 03:35:55 13342
875328642 2008-03-01 3 2008-04-28 2008-01-23 08:08:38 74542
875328642 2008-03-01 3 2008-04-30 2008-01-25 12:55:11 77536
011028734 2008-07-12 2 2008-12-03 2008-08-07 11:57:03 23422
011028734 2008-07-12 2 2008-12-03 2008-08-11 17:23:29 25748
018264981 2008-07-09 0 2008-12-03 2008-12-07 02:18:12 00432
018264981 2008-07-09 0 2008-12-03 2008-12-11 17:44:19 00773

每个ID_NUMBER或更早的FILE_LOAD_DATE或更小的BATCH_NUM是我要保留的记录。

有没有办法编写一个删除其他记录的查询,也许使用带有ROW_NUMBER() 的 CTE?

我希望有一些干燥的东西,以防这个问题再次发生。谢谢!

(如果不是太麻烦,请解释解决方案的工作原理)

【问题讨论】:

【参考方案1】:

您可以在此处使用可删除的 CTE:

WITH cte AS (
    SELECT *, ROW_NUMBER() OVER (PARTITION BY ID_NUMBER, INCEPTION_DATE, OCCURRENCE
                                 ORDER BY FILE_LOAD_DATE, BATCH_NUM) rn
    FROM mastertable
    WHERE SOURCE_SYSTEM = 'LEGACY'
)

DELETE
FROM cte
WHERE rn > 1;

逻辑是为ID_NUMBERINCEPTION_DATEOCCURRENCE 具有相同值的每组记录分配一个行号。第一行编号值 1 将分配给具有 最早 FILE_LOAD_DATE 的记录。如果有两个或多个记录与最早的FILE_LOAD_DATE 并列,则决胜局将由最早的BATCH_NUM 确定。

delete 语句删除所有记录除了最早的记录。

【讨论】:

以上是关于尝试删除 SQL Server 中的重复行,其中差异是日期或批号的主要内容,如果未能解决你的问题,请参考以下文章

如何删除SQL Server中的重复行?

sql数据库中出现重复行数据,如何删除这些重复记录?

SQL Server:如何对两列/条件进行重复数据删除?

sql server无法修改数据

如何在一定时间后自动删除sql server中的记录[重复]

SQL Server删除重复行的6个方法