匹配列并删除 Shell 中的重复项

Posted

技术标签:

【中文标题】匹配列并删除 Shell 中的重复项【英文标题】:match column and delete the Duplicates in Shell 【发布时间】:2022-01-18 03:29:41 【问题描述】:

输入文件

Failed,2021-12-14 05:47 EST,On-Demand Backup,abc,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Completed,2021-12-14 05:47 EST,On-Demand Backup,def,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Failed,2021-12-13 19:33 EST,Scheduled Backup,def,/clients/FORD_730PM_EST_Windows2008,Windows File System  
Failed,2021-12-14 00:09 EST,Scheduled Backup,abc,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System

预期输出

Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System

我只想要那些永远不会成功并且没有为他们运行按需备份的客户端。

我试过的代码

awk -F ',' '
   $1~/Failed/   fail[$4]=$0 
  $1~/Completed/ delete fail[$4]
 $3 ~ /Demand/ delete fail[$4]
END for (i in fail) print fail[i]     
 ' test

【问题讨论】:

【参考方案1】:

这是一个 ruby​​,它将处理多个条目(如果有)和 csv 怪癖,例如嵌入式逗号:

ruby -r csv -e '
BEGINhsh = Hash.new |hash,key| hash[key] = []
      data = Hash.new |hash,key| hash[key] = []

CSV.parse($<.read).each |r|    hsh[r[3]] << r[0]; hsh[r[3]] << r[2]
                                data[r[3]] << r.to_csv
                        
ENDhsh.each|k,v| s=v.join("\t")
    puts data[k].join() if !s[/Completed|Demand/] 
' file

打印:

Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System

【讨论】:

【参考方案2】:

使用您展示的示例,请尝试关注awk 程序。在 Input_file 的单遍中。这将只打印那些失败的值,并且根据显示的示例,它们的值中永远不会有任何按需值。

awk '
BEGIN          FS=OFS=","  
$1=="Failed"   arr1[$4]=$0 
$3~/On-Demand/ arr2[$4]    
END
  for(key in arr1)
    if(!(key in arr2))
      print arr1[key]
    
  

' Input_file

说明:为上述添加详细说明。

awk '                           ##Starting awk program from here.
BEGIN          FS=OFS=","     ##Starting BEGIN section and setting FS and OFS to , here.
$1=="Failed"   arr1[$4]=$0    ##Checking if 1st field is Failed then create arr1 with 4th field as an index and value of whole line.
$3~/On-Demand/ arr2[$4]       ##Checking if 3rd field is On-Demand then create arr2 array with index of 4th field.
END                            ##Starting END block of this program from here.
  for(key in arr1)             ##Traversing through arr1 here.
    if(!(key in arr2))         ##Checking condition if key is NOT present in arr2 then do following.
      print arr1[key]           ##Printing arr1 value with index of key here.
    
  

' Input_file                    ##Mentioning Input_file here.

【讨论】:

【参考方案3】:

你可以使用这个awk命令:

awk -F, 'NR==FNR if ($1~/Failed/) fail[$4] = $0; next
$1 ~ /Completed/ || $3 ~ /Demand/ delete fail[$4]
END for (i in fail) print fail[i]' file file

Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System

【讨论】:

以上是关于匹配列并删除 Shell 中的重复项的主要内容,如果未能解决你的问题,请参考以下文章

如果两个值匹配,则从 php 中的多维关联数组中删除重复项

Python startswith() 不删除所有匹配项[重复]

LeetCode ( 26 ) ---[删除有序数组中的重复项](Java)

如果日期匹配,SQL 删除重复项

如何删除字符串中的重复项

xslt2+ 如何将组与任何匹配的元素组合并删除元素的重复项