如何删除SQL Server中的重复行?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何删除SQL Server中的重复行?相关的知识,希望对你有一定的参考价值。
如何删除不存在unique row id
的重复行?
我的桌子是
col1 col2 col3 col4 col5 col6 col7
john 1 1 1 1 1 1
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
sally 2 2 2 2 2 2
我希望在重复删除后留下以下内容:
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
我尝试了一些查询,但我认为他们依赖于行ID,因为我没有得到所需的结果。例如:
DELETE
FROM table
WHERE col1 IN (
SELECT id
FROM table
GROUP BY id
HAVING (COUNT(col1) > 1)
)
我喜欢CTE和ROW_NUMBER
,因为这两个组合允许我们查看哪些行被删除(或更新),因此只需将DELETE FROM CTE...
更改为SELECT * FROM CTE
:
WITH CTE AS(
SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1
DEMO(结果不同;我认为这是由于你的错字)
COL1 COL2 COL3 COL4 COL5 COL6 COL7
john 1 1 1 1 1 1
sally 2 2 2 2 2 2
由于col1
,此示例通过单个列PARTITION BY col1
确定重复项。如果要包含多个列,只需将它们添加到PARTITION BY
:
ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)
尝试使用:
-- this query will keep only one instance of a duplicate record.
;WITH cte
AS (SELECT ROW_NUMBER() OVER (PARTITION BY col1, col2, col3-- based on what? --can be multiple columns
ORDER BY ( SELECT 0)) RN
FROM Mytable)
delete FROM cte
WHERE RN > 1
SELECT linkorder
,Row_Number() OVER (
PARTITION BY linkorder ORDER BY linkorder DESC
) AS RowNum
FROM u_links
with myCTE
as
(
select productName,ROW_NUMBER() over(PARTITION BY productName order by slno) as Duplicate from productDetails
)
Delete from myCTE where Duplicate>1
删除重复的想法涉及到
- a)保护那些不重复的行
- b)保留合并为重复的许多行中的一行。
一步步
- 1)首先确定那些满足重复定义的行,并将它们插入临时表,比如#tableAll。
- 2)在临时表中选择非重复(单行)或不同的行,如#tableUnique。
- 3)从加入#tableAll的源表中删除以删除重复项。
- 4)从#tableUnique中插入源表中的所有行。
- 5)删除#tableAll和#tableUnique
如果您能够临时向表中添加列,这是一个对我有用的解决方案:
With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server
然后使用MIN和GROUP BY的组合执行DELETE
ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)
验证DELETE是否正确执行:
DELETE b
FROM dbo.DUPPEDTABLE b
WHERE b.RowID NOT IN (
SELECT MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE
);
结果应该没有计数大于1的行。最后,删除rowid列:
SELECT a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
a.CHARACTERISTIC,
a.INTVALUE,
a.FLOATVALUE,
a.STRINGVALUE
ORDER BY COUNT(*) DESC
您需要根据字段对重复记录进行分组,然后保留其中一条记录并删除其余记录。例如:
ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;
在一步中删除重复行而不丢失信息的另一种方法如下:
DELETE prg.Person WHERE Id IN (
SELECT dublicateRow.Id FROM
(
select MIN(Id) MinId, NationalCode
from prg.Person group by NationalCode having count(NationalCode ) > 1
) GroupSelect
JOIN prg.Person dublicateRow ON dublicateRow.NationalCode = GroupSelect.NationalCode
WHERE dublicateRow.Id <> GroupSelect.MinId)
哇,通过准备好所有这些答案我觉得很蠢,他们就像专家对所有CTE和临时表等的答案。
我所做的就是使用MAX汇总ID列。
delete from dublicated_table t1 (nolock)
join (
select t2.dublicated_field
, min(len(t2.field_kept)) as min_field_kept
from dublicated_table t2 (nolock)
group by t2.dublicated_field having COUNT(*)>1
) t3
on t1.dublicated_field=t3.dublicated_field
and len(t1.field_kept)=t3.min_field_kept
注意:您可能需要多次运行它才能删除重复项,因为这样一次只能删除一组重复的行。
DELETE FROM table WHERE col1 IN (
SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)
这将删除所有重复的行,并仅为您提供不同的值(行)。
我更喜欢CTE从sql server表中删除重复的行
强烈建议遵循这篇文章:: http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/
保持原创
WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)
DELETE FROM CTE WHERE RN<>1
没有保持原创
WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)
不使用CTE
和ROW_NUMBER()
你可以只使用group by MAX
函数删除记录这里是和示例
DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)
DELETE from search
where id not in (
select min(id) from search
group by url
having count(*)=1
union
SELECT min(id) FROM search
group by url
having count(*) > 1
)
Microsoft有一个关于如何删除重复项的非常简洁的指南。查看qazxsw poi
简而言之,当您只删除几行时,这是删除重复项的最简单方法:
http://support.microsoft.com/kb/139444
myprimarykey是行的标识符。
我将rowcount设置为1,因为我只有两行是重复的。如果我有3行重复,那么我将rowcount设置为2,以便它删除它看到的前两个,只留下表t1中的一个。
希望它能帮助任何人
请看下面的删除方式。
SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;
创建了一个名为Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)
的示例表,并使用给定的数据加载它。
@table
Delete aliasName from (
Select *,
ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From @table) aliasName
Where rowNumber > 1
Select * from @table
注意:如果你给部分中的所有列,那么Partition by
没有多大意义。
我知道,这个问题是在三年前提出来的,而我的回答是蒂姆发布的另一个版本,但是发布它只是对任何人都有帮助。
如果没有引用,例如外键,则可以执行此操作。在测试概念证明并且测试数据重复时,我做了很多。
SELECT DISTINCT [col1],[col2],[col3],[co
以上是关于如何删除SQL Server中的重复行?的主要内容,如果未能解决你的问题,请参考以下文章