当一列具有唯一值时如何删除重复记录

Posted

技术标签:

【中文标题】当一列具有唯一值时如何删除重复记录【英文标题】:How to delete duplicate records when one columns having unique value 【发布时间】:2020-10-04 18:22:14 【问题描述】:

使用表中的以下查询:

select * 
from hourly_report_table 
where API_HOUR = 9 
  and API_DATE = date '2020-09-30' 
  and total_trans = 72506;

下表中有重复记录,如何删除:

ID  APPLICATION API_DATE    API_HOUR    SO  APP API ACTUAL_API  AVG_RUN TOTAL_TRANS GOOD_TRANS  FAIL_TRANS  FAIL_PERC   COUNTS_TO1  PERC_TO1    COUNTS_TO15 PERC_TO15   COUNTS_OVER15   PERC_OVER15 COUNTS_1TO5 PERC_1TO5   COUNTS_5TO10    PERC_5TO10  COUNTS_10TO15   PERC_10TO15 COUNTS_15TO30   PERC_15TO30 COUNTS_30TO60   PERC_30TO60 COUNTS_OVER60   PERC_OVER60 CREATED_USER_ID CREATED_TIME_STAMP  METRIC  AVG_RUN_GOOD    AVG_RUN_FAIL
225344087   LS  30-Sep-20   9       G2  GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506   72505   1   1.3792E-05  72007   0.993117811 72477   0.999600033 29  0.000399967 403 0.005558161 52  0.000717182 15  0.000206879 12  0.000165504 13  0.000179296 4   5.51678E-05 UFOSODRPT   4-Oct-20    A   0.176561258 20.256
225278469   LS  30-Sep-20   9       G2  GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506   72505   1   1.3792E-05  72007   0.993117811 72477   0.999600033 29  0.000399967 403 0.005558161 52  0.000717182 15  0.000206879 12  0.000165504 13  0.000179296 4   5.51678E-05 UFOSODRPT   4-Oct-20    A   0.176561258 20.256
224980737   LS  30-Sep-20   9       G2  GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506   72505   1   1.3792E-05  72007   0.993117811 72477   0.999600033 29  0.000399967 403 0.005558161 52  0.000717182 15  0.000206879 12  0.000165504 13  0.000179296 4   5.51678E-05 UFOSODRPT   4-Oct-20    A   0.176561258 20.256
225548611   LS  30-Sep-20   9       G2  GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506   72505   1   1.3792E-05  72007   0.993117811 72477   0.999600033 29  0.000399967 403 0.005558161 52  0.000717182 15  0.000206879 12  0.000165504 13  0.000179296 4   5.51678E-05 UFOSODRPT   4-Oct-20    A   0.176561258 20.256
225452770   LS  30-Sep-20   9       G2  GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506   72505   1   1.3792E-05  72007   0.993117811 72477   0.999600033 29  0.000399967 403 0.005558161 52  0.000717182 15  0.000206879 12  0.000165504 13  0.000179296 4   5.51678E-05 UFOSODRPT   4-Oct-20    A   0.176561258 20.256

感谢尝试进行请求更改,但它以某种方式从表中删除了所有 5 条记录:-

delete  from hourly_report_table
    where id not in (select id
                     from (select max(id) id, application, api_date, api_hour, so
                           from test
                           group by application, api_date, api_hour, so
                          )
                     ) and API_HOUR=9 and API_DATE=date '2020-09-30' and total_trans=72506;
                     
           5 rows deleted.   

谢谢,再次尝试使用请求的命令,但它再次在 2017 年 3 月 23 日之后的输出中提供数据。

   select *  from hourly_report_table 
Where Id Not in
          (Select max(Id)
            from hourly_report_table  where API_HOUR=9 and API_DATE=date '2020-09-30' and total_trans=72506
            group by APPLICATION, API_DATE, API_HOUR, SO, APP, API, ACTUAL_API, AVG_RUN, AVG_RUN_GOOD, AVG_RUN_FAIL, 
TOTAL_TRANS, GOOD_TRANS, FAIL_TRANS, FAIL_PERC, COUNTS_TO1, PERC_TO1, COUNTS_TO15, PERC_TO15, COUNTS_OVER15, PERC_OVER15, 
COUNTS_1TO5, PERC_1TO5, COUNTS_5TO10, PERC_5TO10, COUNTS_10TO15, PERC_10TO15, COUNTS_15TO30, PERC_15TO30, COUNTS_30TO60, 
PERC_30TO60, COUNTS_OVER60, PERC_OVER60, CREATED_USER_ID, CREATED_TIME_STAMP, METRIC, AVG_RUN_GOOD, AVG_RUN_FAIL);
        
        
        
24134557    TSNR    23-MAR-17   3       CSI InquireWirelineServiceMaintenanceDetails_POTSWtn    InquireWirelineServiceMaintenanceDetails_POTSWtn    1.344
24134558    TSNR    23-MAR-17   3       RTTP    STB_SEND_MESSAGE    RTTPSendMessageToSTB    1.099
24134559    TSNR    23-MAR-17   3       CSI InquireFiberServiceOrderDetail_Detail   InquireFiberServiceOrderDetail_Detail   0.976820512820513
24134560    TSNR    23-MAR-17   3       CMS GetLiveData_5031NV-030  GetLiveData_5031NV-030  20.828

以下是表 hourly_report_table 中的记录:

ID APPLICATION API_DATE API_HOUR SO APP API ACTUAL_API AVG_RUN TOTAL_TRANS GOOD_TRANS FAIL_TRANS FAIL_PERC COUNTS_TO1 PERC_TO1 COUNTS_TO15 PERC_TO15 COUNTS_OVER15 PERC_OVER15 COUNTS_1TO5 PERC_1TO5 COUNTS_5TO10 PERC_5TO10 COUNTS_10TO15 PERC_10TO15 COUNTS_15TO30 PERC_15TO30 COUNTS_30TO60 PERC_30TO60 COUNTS_OVER60 PERC_OVER60 CREATED_USER_ID CREATED_TIME_STAMP METRIC AVG_RUN_GOOD AVG_RUN_FAIL



SAMPLE DATA IN TABLE, ABOVE IS COLUMN NAME AND BELOW CORRESPONDING VALUES, BELOW VALUES ARE HAVING IDENTICAL, SO WE NEED TO KEEP ONE AND REMOVE ALL OTHER DUPLICATES VALUES FROM RECORDS.

225344087 LS 30-Sep-20 9 G2 GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506 72505 1 1.3792E-05 72007 0.993117811 72477 0.999600033 29 0.000399967 403 0.005558161 52 0.000717182 15 0.000206879 12 0.000165504 13 0.000179296 4 5.51678E-05 UFOSODRPT 4-Oct-20 A 0.176561258 20.256

225278469 LS 30-Sep-20 9 G2 GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506 72505 1 1.3792E-05 72007 0.993117811 72477 0.999600033 29 0.000399967 403 0.005558161 52 0.000717182 15 0.000206879 12 0.000165504 13 0.000179296 4 5.51678E-05 UFOSODRPT 4-Oct-20 A 0.176561258 20.256

224980737 LS 30-Sep-20 9 G2 GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506 72505 1 1.3792E-05 72007 0.993117811 72477 0.999600033 29 0.000399967 403 0.005558161 52 0.000717182 15 0.000206879 12 0.000165504 13 0.000179296 4 5.51678E-05 UFOSODRPT 4-Oct-20 A 0.176561258 20.256

225548611 LS 30-Sep-20 9 G2 GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506 72505 1 1.3792E-05 72007 0.993117811 72477 0.999600033 29 0.000399967 403 0.005558161 52 0.000717182 15 0.000206879 12 0.000165504 13 0.000179296 4 5.51678E-05 UFOSODRPT 4-Oct-20 A 0.176561258 20.256

225452770 LS 30-Sep-20 9 G2 GetCustomerSnapshot GetCustomerSnapshot 0.176920834 72506 72505 1 1.3792E-05 72007 0.993117811 72477 0.999600033 29 0.000399967 403 0.005558161 52 0.000717182 15 0.000206879 12 0.000165504 13 0.000179296 4 5.51678E-05 UFOSODRPT 4-Oct-20 A 0.176561258 20.256

在以下列中找到十进制值:

FAIL_PERC:0.0000137919620445205 PERC_T01:0.993117810939784 PERC_T015:0.999600033100 PERC_OVER15=0.0003999 AVG_RUN_GOOD:0.17656 AVG_RUN_FAIL: 20.256

表格名称:hourly_report_table

【问题讨论】:

很难说清楚,尤其是在手机上,哪一列的值不重复——它是什么,它的数据类型是什么,你想保留哪一个? (例如,如果它是一个日期,并且您想保留最新的..) This 可能是一个解决方案,但您必须将所有列放在partition by 部分中,ID 除外 您能否格式化您在问题中输入的文本数据。更少的记录就可以了。不格式化很难理解。 您已经问过同样的问题 (***.com/questions/64167057/…),我告诉过您如何删除重复项。您告诉我仍然存在重复项,我告诉您我认为它们只是看似重复项,您的工具可能会抑制数字或日期时间的时间部分中的小数位。你检查过这个吗? 是的 Thorsten,我检查了查询,更新并编辑了响应。 【参考方案1】:

由于您只想保留一行(哪一行?假设那些“重复”值的 ID 为 MAX 的那一行),那么这可能是一种选择:

样本数据:

SQL> select * From test;

        ID AP API_DATE     API_HOUR SO
---------- -- ---------- ---------- --
      4087 LS 2020-09-30          9 G2
      8469 LS 2020-09-30          9 G2
       737 LS 2020-09-30          9 G2
      8611 XX 2020-05-30          2 G1
      2770 XX 2020-05-30          2 G1

删除重复项:

SQL> delete from test
  2  where id not in (select max(id)
  3                   from test
  4                   group by application, api_date, api_hour, so
  5                  );

3 rows deleted.

还有什么?

SQL> select * From test;

        ID AP API_DATE     API_HOUR SO
---------- -- ---------- ---------- --
      8469 LS 2020-09-30          9 G2
      8611 XX 2020-05-30          2 G1

SQL>

【讨论】:

感谢 Littlefoot,如果有数百万条记录,那么如何删除这些记录。 同理。只是需要更长的时间。 为什么有两个子查询?你可以把这个where id not in (select max(id) from test group by application, api_date, api_hour, so). 编辑了在输出 23-03-2017 中显示的请求更改,其中给出了选择 2020 年 9 月 30 日数据的条件 那是因为您选择了 ID 为 NOT IN 日期值为 30-09-2020 的行。【参考方案2】:

假设唯一列是“Id”,重复列是 col1, col2 & col3 (Table : myTable) ;你可以简单地做 -

Delete from myTable
Where Id Not in
          (Select max(Id)
            from myTable
            group by col1, col2, col3);

编辑:这也适用于大量记录。

更新:你还应该在外面指定你的 where 条件,否则它会考虑所有没有来自子查询的 id。这是为了确保将 Id not in condition 应用于满足另一组标准的记录子集,而不是在全局范围内应用。请参考下文。

 select *  from hourly_report_table 
 Where Id Not in
      (Select max(Id)
        from hourly_report_table  where API_HOUR=9 and API_DATE=date '2020-09-30' and total_trans=72506
        group by APPLICATION, API_DATE, API_HOUR, SO, APP, API, ACTUAL_API, AVG_RUN, AVG_RUN_GOOD, AVG_RUN_FAIL, 
TOTAL_TRANS, GOOD_TRANS, FAIL_TRANS, FAIL_PERC, COUNTS_TO1, PERC_TO1, COUNTS_TO15, PERC_TO15, COUNTS_OVER15, PERC_OVER15, 
COUNTS_1TO5, PERC_1TO5, COUNTS_5TO10, PERC_5TO10, COUNTS_10TO15, PERC_10TO15, COUNTS_15TO30, PERC_15TO30, COUNTS_30TO60, 
PERC_30TO60, COUNTS_OVER60, PERC_OVER60, CREATED_USER_ID, CREATED_TIME_STAMP, METRIC, AVG_RUN_GOOD, AVG_RUN_FAIL)
and API_HOUR=9 and API_DATE=date '2020-09-30' and total_trans=72506;

【讨论】:

@singh_dba - 你看过我上面回答的更新部分吗? 是的,我们已经运行了上面的查询,但没有收到任何输出。

以上是关于当一列具有唯一值时如何删除重复记录的主要内容,如果未能解决你的问题,请参考以下文章

从 BigQuery 中删除重复记录

如何将数据集拆分为两个具有唯一和重复行的数据集?

如何将数据集拆分为两个具有唯一和重复行的数据集?

Oracle如何删除表中重复记录

Oracle中用Rowid查找和删除重复记录!

如何解决Oracle“不能创建唯一索引,发现重复记录”问题