比较多列并仅在匹配时替换
Posted
技术标签:
【中文标题】比较多列并仅在匹配时替换【英文标题】:compare multiple columns and only replace if matching 【发布时间】:2018-04-18 01:21:24 【问题描述】: 我有两个文件(文件 1 和文件 2)我正在尝试将 File1 的 Column1 和 2 的字符串与 File2 的 Column4 和 5 进行比较。除此匹配外,File2的column6还需要匹配某个字符串,如SO或CO(因为FILE1的column3和column分别为SO和CO),然后将FILE2的column7替换为FILE1的column3,否则保持其他不变。
我尝试修改并使用论坛提供的解决方案解决类似问题,但没有成功。
FILE1
type code SO CO other
7757 1 6941.958 138.922 149.17
7757 2 8666.123 198.908 225.67
7757 4 2795.885 334.875 378.68
7759 GT3 222.104 13.5 734.62
7768 CT2 0 0 0
7805 6 3796.677 75.175 79.09
FILE2
"US","01073",,"7757","1","SO","10","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","10","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","10","299"
Required output:
"US","01073",,"7757","1","SO","6941.958","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","138.922","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","75.175","299"
我尝试过的解决方案(仅适用于 CO):
tr -d '"' < FILE2 > temp # to remove double quote
awk 'NR==FNRA[$1,$2]=$3;next A[$4,$5] && $6=="CO" $7=A[$1,$2]; print' FS=" " OFS="," FILE1 temp > out
【问题讨论】:
非常感谢您帮助编辑我的代码!随机数。 【参考方案1】:复杂的awk解决方案:
awk 'function unquote(f)
return substr(f, 2, length(f)-2)
NR==FNR
if (NR==1) f3=$3; f4=$4
else if (NF) a[$1,$2,f3]=$3; a[$1,$2,f4]=$4
next;
k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6)
k in a $7=a[k] 1' file1 FS=',' OFS=',' file2
function unquote(f) ...
- 取消引用/提取双引号之间的值(实际上 - 在字符串的第一个和最后一个字符之间)
a[$1,$2,f3]=$3; a[$1,$2,f4]=$4
- 对关键序列进行分组
输出:
"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"
【讨论】:
您好 RomanPerekhrest,感谢您的帮助。你的剧本对我来说看起来很棒。但是,我一直得到与“file2”相同的输出,这意味着输出中的 column7 中没有任何替换。有什么提示吗? @kelly,提示:确保您已发布实际输入样本,因为它们已被复制和测试。该解决方案适用于当前发布的示例 RomanPerekhrest ,这是我的问题,您的代码运行良好。非常感谢您的帮助和时间。 @kelly,没关系 @RomanPerekhrest 的解决方案与测试数据完美配合。但是 FILE2 中的实际数据存在问题:column2 类似于“abc,45”或“abc23”,这意味着有些在双引号内有逗号,有些则没有。由于我不能使用双引号作为这个问题的分隔符,如何处理呢?谢谢你的帮助。以上是关于比较多列并仅在匹配时替换的主要内容,如果未能解决你的问题,请参考以下文章