awk基于匹配键在行中打印列数据

Posted 2021-05-06

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了awk基于匹配键在行中打印列数据相关的知识，希望对你有一定的参考价值。

我正在尝试编写一个awk字符串，以根据匹配打印行中的列数据。我的文件如下：

$ cat 1.txt  
2016-05-10,UJ,ALL 1 7  
2016-05-10,UJ,ALL 1 10  
2016-05-10,UJ,ALL 1 9  
2016-05-10,UJ,ALL 1 8  
2016-05-10,UJ,ALL 1 14  
2016-05-10,UJ,ALL 1 8  
2016-05-10,UJ,ALL 1 12  
2016-05-10,UJ,ALL 2 11  
2016-05-10,UJ,ALL 1 10  
2016-05-10,UJ,ALL 2 12  
2016-05-10,UJ,ALL 2 9  
2016-05-10,UJ,ALL 1 13

预期输出如下（唯一密钥匹配在第一个空格之前，即2016-05-10，UJ，ALL）

2016-05-10,UJ,ALL<	ab>1 1 1 1 1 1 1 2 1 2 2 1<	ab>7 10 9 8 14 8 12 11 10 12 9 13

我使用下面的awk模式匹配

awk '$1 != prev{printf "%s%s",ors,$1; ors=ORS; ofs="	"} {printf "%s%s",ofs,$2; ofs=OFS; prev=$1} END{print ""}' 1.txt

但它没有在最后一栏工作，我尝试了所有可能的组合，但没有成功...请建议。

答案

我会去做类似的事情：

awk -v OFS="	" '{
     cols[$1];
     col2[$1]=(length(col2[$1]) ? col2[$1] FS : "") $2;
     col3[$1]=(length(col3[$1]) ? col3[$1] FS : "") $3
     } END {for (i in cols) print i, col2[i], col3[i]}' file

看到它的实际效果：

$ awk -v OFS="	" '{cols[$1]; col2[$1]=(length(col2[$1]) ? col2[$1] FS : "") $2; col3[$1]=(length(col3[$1]) ? col3[$1] FS : "") $3} END {for (i in cols) print i, col2[i], col3[i]}' a
2016-05-10,UJ,ALL   1 1 1 1 1 1 1 2 1 2 2 1 7 10 9 8 14 8 12 11 10 12 9 13
#                ^                         ^
#                tab                       tab

另一答案

$ head -n1 1.txt | cut -d' ' -f1
2016-05-10,UJ,ALL
$ # transform multiple lines to single line with space as separator
$ cut -d' ' -f2 1.txt | paste -sd' '
1 1 1 1 1 1 1 2 1 2 2 1
$ cut -d' ' -f3 1.txt | paste -sd' '
7 10 9 8 14 8 12 11 10 12 9 13

$ # finally, combine the three results
$ # by default paste uses tab as delimiter
$ paste <(head -n1 1.txt | cut -d' ' -f1) <(cut -d' ' -f2 1.txt | paste -sd' ') <(cut -d' ' -f3 1.txt | paste -sd' ') 
2016-05-10,UJ,ALL   1 1 1 1 1 1 1 2 1 2 2 1 7 10 9 8 14 8 12 11 10 12 9 13

$ # to use a different delimiter
$ paste -d: <(head -n1 1.txt | cut -d' ' -f1) <(cut -d' ' -f2 1.txt | paste -sd' ') <(cut -d' ' -f3 1.txt | paste -sd' ')
2016-05-10,UJ,ALL:1 1 1 1 1 1 1 2 1 2 2 1:7 10 9 8 14 8 12 11 10 12 9 13

另一种选择是使用GNU datamash，但它会给出逗号分隔值

$ datamash -t' ' -W -g1 collapse 2 -g1 collapse 3 <1.txt
2016-05-10,UJ,ALL   1,1,1,1,1,1,1,2,1,2,2,1 7,10,9,8,14,8,12,11,10,12,9,13

-t' '输入分隔符是空格
-W whitespace作为输出分隔符
-g1 collapse 2逗号使用第1列作为键分隔第2列值
-g1 collapse 3逗号使用第1列作为键分隔第3列值

以上是关于awk基于匹配键在行中打印列数据的主要内容，如果未能解决你的问题，请参考以下文章

AWK 比较两个文件中的两列输出匹配行 - 匹配中缺少行

Linux：打印（输出）所有的列（awk, $0）

如何解释和优化awk数组以匹配和修改两个文件的公共列

awk 日期到日志文件中的纪元并打印其他列

awk编程的基本用法

在 awk 中打印匹配的字段分隔符