按一列合并两个文件 AWK

Posted 2023-03-24

技术标签:

【中文标题】按一列合并两个文件 AWK【英文标题】：Merge two files by one column AWK 【发布时间】：2020-06-05 16:36:39 【问题描述】：

我想将 file1 第 4 列与 file2 第 1 列与 awk 合并，我想从文件 $1 打印第 2 列。如果有多个匹配项（可能超过 100 个），则以逗号分隔打印。

文件1：

alo descrip 1  PAPA
alo descrip 2  LOPA
alo descrip 3  REP
alo descrip 4  SEPO
dlo sapro   31 REP
dlo sapro   35 PAPA

文件2：

PAPA klob trop
PAPA kopo topo
HOJ  sasa laso
REP  deso rez
SEPO raz  ghul
REP  kok  loko

输出：

PAPA klob trop descrip,sapro
PAPA kopo topo descrip,sapro
HOJ  sasa laso NA
REP  deso rez  descrip,sapro
SEPO raz  ghul descrip
REP  kok  loko descrip,sapro

我试过了：

awk -v FILE_A="FILE1" -v OFS="\t" 'BEGIN while ( ( getline 0 ) 价值 = $0 ;子( /^[^ ]+ /, "", VAL ) ; DICT[ $1 ] = VAL 打印 $0, DICT[ $4 ] ' 文件 2

但它不起作用。

【问题讨论】：

根据您的要求，我认为this 可能是您所需要的。如果我对两个文件都有重复，这不是问题吗？ 【参考方案1】：

请您尝试关注一下。

awk '
FNR==NR
  a[$NF]=(a[$NF]?a[$NF] ",":"")$2
  next


  printf("%s %s\n",$0,($1 in a)?a[$1]:"NA")

'  Input_file1  Input_file2

说明：为上述代码添加详细说明。

awk '                                          ##Starting awk program fro here.
FNR==NR                                       ##Checking condition FNR==NR whioh will be TRUE when Input_file1 is being read.
  a[$NF]=(a[$NF]?a[$NF] ",":"")$2              ##Creating arra a with index $NF, its value is keep appending to its own value with $2 of current line.
  next                                         ##next will skip all further lines from here.


  printf("%s %s\n",$0,($1 in a)?a[$1]:"NA")    ##Printing current line then either value of array or NA depending upon if condition satisfies.

'  Input_file1 Input_file2                     ##Mentioning Input_file names here.

【讨论】：

【参考方案2】：

本质上，问题是当存在重复键时如何将数据存储到数组中。 @RavinderSingh13 出色地展示了如何将数据附加到索引数组元素。另一种方法是使用多维数组。以下是如何在 GNU awk 中使用它们的示例：

$ gawk '                                               # using GNU awk
NR==FNR                                               # process first file
    a[$4][++c[$4]]=$2                                  # 2d array
    next

                                                      # process second file
    printf "%s%s",$0,OFS                               # print the record
    if($1 in a)                                        # if key is found in array
        for(i=1;i<=c[$1];i++)                          # process related dimension
            printf "%s%s",a[$1][i],(i==c[$1]?ORS:",")  # and output elements
    else                                               # if key was not in array
        print "NA"                                     # output NA
' file1 file2

【讨论】：

以上是关于按一列合并两个文件 AWK的主要内容，如果未能解决你的问题，请参考以下文章