Oracle删除重复数据

Posted 2020-12-04 zengchenri

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Oracle删除重复数据相关的知识，希望对你有一定的参考价值。

背景：有两个数据库（源数据库，和目标数据库），每天把源数据库了数据同步到目标数据库中，由于各种原因，怕数据丢失，所有同步8天前后的数据（有主键，不要担心重复，每天十几万条，表中已经有6千万条），但是不知道哪天有同事把主键误drop掉。

统计的BI报表数据多的离谱。经过的一番折腾，问题解决了。下面总结一下几种方法：

1）闪回：oracle有闪回技术，可以利用recyclebin（回收站）查询删除的的主键，但是这之前要把重复的数据删除。

2）利用rowid查询重复数据并且干掉相同数据除rowid最小，语句：

delete from 表 a where (a.Id,a.seq) in(select Id,seq from 表 group by Id,seq having count(*)> 1) and rowid not in (select min(rowid) from 表group by Id,seq having count(*)>1)

这条dml语句就是噩梦，因为有"not in" 如果你的数据量大，请慎用。

3）也就是经过实践的方法，效率还可以，大概5分钟就删除了。步奏如下：

1.查询表中的重复数据

select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) （a.Id,a.seq 是有重复的主键）

2.建一张表

create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); commit ;(这样lsb的表结构就和表1的表结构一样)

3.删除表1里的重复数据

delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ;

commit;

4.查询lsb表中的rowid最小的数据

select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1)

5.把查询出来的rowid插入到表1里

insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ;

commit;

6.drop table lsb;

4）整体步奏

create table lsb as select * from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1); --也可以是临时表效率更高（不需要写磁盘）

commit ;

delete from 表1 a where (a.Id,a.seq) in(select Id,seq from 表1 group by Id,seq having count(*)> 1) ;

commit;

insert into 表1 select * from lsb a where a.rowid in(select min(rowid) from lsb group by Id,seq having count(*)> 1) ;

commit;

drop table lsb;

以上是关于Oracle删除重复数据的主要内容，如果未能解决你的问题，请参考以下文章