匹配列并删除 Shell 中的重复项
Posted
技术标签:
【中文标题】匹配列并删除 Shell 中的重复项【英文标题】:match column and delete the Duplicates in Shell 【发布时间】:2022-01-18 03:29:41 【问题描述】:输入文件
Failed,2021-12-14 05:47 EST,On-Demand Backup,abc,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Completed,2021-12-14 05:47 EST,On-Demand Backup,def,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Failed,2021-12-13 19:33 EST,Scheduled Backup,def,/clients/FORD_730PM_EST_Windows2008,Windows File System
Failed,2021-12-14 00:09 EST,Scheduled Backup,abc,/clients/FORD_1130PM_EST_Windows2008,Windows File System
Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System
预期输出
Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System
我只想要那些永远不会成功并且没有为他们运行按需备份的客户端。
我试过的代码
awk -F ',' '
$1~/Failed/ fail[$4]=$0
$1~/Completed/ delete fail[$4]
$3 ~ /Demand/ delete fail[$4]
END for (i in fail) print fail[i]
' test
【问题讨论】:
【参考方案1】:这是一个 ruby,它将处理多个条目(如果有)和 csv 怪癖,例如嵌入式逗号:
ruby -r csv -e '
BEGINhsh = Hash.new |hash,key| hash[key] = []
data = Hash.new |hash,key| hash[key] = []
CSV.parse($<.read).each |r| hsh[r[3]] << r[0]; hsh[r[3]] << r[2]
data[r[3]] << r.to_csv
ENDhsh.each|k,v| s=v.join("\t")
puts data[k].join() if !s[/Completed|Demand/]
' file
打印:
Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System
【讨论】:
【参考方案2】:使用您展示的示例,请尝试关注awk
程序。在 Input_file 的单遍中。这将只打印那些失败的值,并且根据显示的示例,它们的值中永远不会有任何按需值。
awk '
BEGIN FS=OFS=","
$1=="Failed" arr1[$4]=$0
$3~/On-Demand/ arr2[$4]
END
for(key in arr1)
if(!(key in arr2))
print arr1[key]
' Input_file
说明:为上述添加详细说明。
awk ' ##Starting awk program from here.
BEGIN FS=OFS="," ##Starting BEGIN section and setting FS and OFS to , here.
$1=="Failed" arr1[$4]=$0 ##Checking if 1st field is Failed then create arr1 with 4th field as an index and value of whole line.
$3~/On-Demand/ arr2[$4] ##Checking if 3rd field is On-Demand then create arr2 array with index of 4th field.
END ##Starting END block of this program from here.
for(key in arr1) ##Traversing through arr1 here.
if(!(key in arr2)) ##Checking condition if key is NOT present in arr2 then do following.
print arr1[key] ##Printing arr1 value with index of key here.
' Input_file ##Mentioning Input_file here.
【讨论】:
【参考方案3】:你可以使用这个awk
命令:
awk -F, 'NR==FNR if ($1~/Failed/) fail[$4] = $0; next
$1 ~ /Completed/ || $3 ~ /Demand/ delete fail[$4]
END for (i in fail) print fail[i]' file file
Failed,2021-12-14 00:09 EST,Scheduled Backup,ghi,/clients/FORD_1130PM_EST_Windows2008,Windows File System
【讨论】:
以上是关于匹配列并删除 Shell 中的重复项的主要内容,如果未能解决你的问题,请参考以下文章
Python startswith() 不删除所有匹配项[重复]