比较多列并仅在匹配时替换

Posted

技术标签:

【中文标题】比较多列并仅在匹配时替换【英文标题】:compare multiple columns and only replace if matching 【发布时间】:2018-04-18 01:21:24 【问题描述】: 我有两个文件(文件 1 和文件 2)

我正在尝试将 File1 的 Column1 和 2 的字符串与 File2 的 Column4 和 5 进行比较。除此匹配外,File2的column6还需要匹配某个字符串,如SO或CO(因为FILE1的column3和column分别为SO和CO),然后将FILE2的column7替换为FILE1的column3,否则保持其他不变。

我尝试修改并使用论坛提供的解决方案解决类似问题,但没有成功。

FILE1
type  code     SO  CO other

7757    1       6941.958        138.922 149.17
7757    2       8666.123        198.908 225.67
7757    4       2795.885        334.875 378.68
7759    GT3     222.104    13.5    734.62
7768    CT2     0       0       0
7805    6       3796.677        75.175  79.09 

FILE2
"US","01073",,"7757","1","SO","10","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","10","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","10","299"

Required output:
"US","01073",,"7757","1","SO","6941.958","299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO","138.922","299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO","75.175","299"

我尝试过的解决方案(仅适用于 CO):

tr -d '"' < FILE2 > temp  # to remove double quote
awk 'NR==FNRA[$1,$2]=$3;next A[$4,$5] && $6=="CO" $7=A[$1,$2]; print' FS=" " OFS="," FILE1 temp > out

【问题讨论】:

非常感谢您帮助编辑我的代码!随机数。 【参考方案1】:

复杂的awk解决方案:

awk 'function unquote(f) 
         return substr(f, 2, length(f)-2) 
     
     NR==FNR 
         if (NR==1) f3=$3; f4=$4 
         else if (NF) a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 
         next; 
     
      k=unquote($4) SUBSEP unquote($5) SUBSEP unquote($6) 
     k in a $7=a[k] 1' file1 FS=',' OFS=',' file2

function unquote(f) ... - 取消引用/提取双引号之间的值(实际上 - 在字符串的第一个和最后一个字符之间)

a[$1,$2,f3]=$3; a[$1,$2,f4]=$4 - 对关键序列进行分组


输出:

"US","01073",,"7757","1","SO",6941.958,"299"
"US","01073",,"7758","1","SO","10","299"
"US","01073",,"7757","1","NO","10","299"
"US","01073",,"7757","1","CO",138.922,"299"
"US","01073",,"7757","4","MO","10","299"
"US","01073",,"7757","1","GO","10","299"
"US","01073",,"7805","6","CO",75.175,"299"

【讨论】:

您好 RomanPerekhrest,感谢您的帮助。你的剧本对我来说看起来很棒。但是,我一直得到与“file2”相同的输出,这意味着输出中的 column7 中没有任何替换。有什么提示吗? @kelly,提示:确保您已发布实际输入样本,因为它们已被复制和测试。该解决方案适用于当前发布的示例 RomanPerekhrest ,这是我的问题,您的代码运行良好。非常感谢您的帮助和时间。 @kelly,没关系 @RomanPerekhrest 的解决方案与测试数据完美配合。但是 FILE2 中的实际数据存在问题:column2 类似于“abc,45”或“abc23”,这意味着有些在双引号内有逗号,有些则没有。由于我不能使用双引号作为这个问题的分隔符,如何处理呢?谢谢你的帮助。

以上是关于比较多列并仅在匹配时替换的主要内容,如果未能解决你的问题,请参考以下文章

如何测试通道是不是关闭并仅在未关闭时发送给它

比较两个对象数组并在新数组中返回匹配值

Excel 2010 比较多列(2 列与其他 2 列)

PostgreSQL:多列精确匹配

减去日期并仅在 SQL 中显示整数

准备SQL语句并仅在不重复时输入