如何使用awk将带有标题的新列添加到csv

Posted

技术标签:

【中文标题】如何使用awk将带有标题的新列添加到csv【英文标题】:How to add new column with header to csv with awk 【发布时间】:2018-03-18 13:24:17 【问题描述】:

我在处理 CSV 的 bash 脚本中使用了一些 awk。 awk 这样做:

ORIG_FILE="score_model.csv"   
NEW_FILE="updates/score_model.csv"    
awk -v d="2017_01" -F"," 'BEGIN OFS = "," $(NF+1)=d; print' $ORIG_FILE > $NEW_FILE 

这个转换是什么:

# before
model_description,      type,    effective_date, end_date
Inc <= 40K,             Retired, 08/05/2016,     07/31/2017
Inc > 40K Age <= 55 V5, Retired, 04/30/2016,     07/31/2017
Inc > 40K Age > 55 V5 , Retired, 04/30/2016,     07/31/2017

# after, bad
model_description,      type,    effective_date, end_date,   2017_01  
Inc <= 40K,             Retired, 08/05/2016,     07/31/2017, 2017_01
Inc > 40K Age <= 55 V5, Retired, 04/30/2016,     07/31/2017, 2017_01
Inc > 40K Age > 55 V5 , Retired, 04/30/2016,     07/31/2017, 2017_01

我希望新列有一个标题,以便新的 CSV 看起来像

# after, desired
model_description,      type,    effective_date, end_date,   cmpgn_group  
Inc <= 40K,             Retired, 08/05/2016,     07/31/2017, 2017_01
Inc > 40K Age <= 55 V5, Retired, 04/30/2016,     07/31/2017, 2017_01
Inc > 40K Age > 55 V5 , Retired, 04/30/2016,     07/31/2017, 2017_01

我知道有一种方法可以在第一行单独指定要执行的操作,但我无法弄清楚。

【问题讨论】:

【参考方案1】:

使用 sed

$ sed '1s/$/,\tcmpgn_group/; 2,$s/$/,\t2017_01/' file

1st line:附加,\tcmpgn_group 对于2 to $:追加,\t2017_01

使用 awk

$ awk -v d="2017_01" -F"," 'FNR==1a="cmpgn_group" FNR>1a=d print $0",\t"a' f1

输出:

model_description,      type,    effective_date, end_date,      cmpgn_group
Inc <= 40K,             Retired, 08/05/2016,     07/31/2017,    2017_01
Inc > 40K Age <= 55 V5, Retired, 04/30/2016,     07/31/2017,    2017_01
Inc > 40K Age > 55 V5 , Retired, 04/30/2016,     07/31/2017,    2017_01

【讨论】:

【参考方案2】:

遵循 awk(您的解决方案有所改变)应该适合您。

ORIG_FILE="score_model.csv"   
NEW_FILE="updates/score_model.csv"    
awk -v d="2017_01" -F"," 'BEGIN OFS = "," FNR==1$(NF+1)="cmpgn_group" FNR>1$(NF+1)=d; 1' $ORIG_FILE > $NEW_FILE 

解决方案二:或者让我们删除这个$(NF+1)(creating a new field方法)并尝试直接打印。

awk -v d="2017_01" -F"," 'BEGIN OFS = "," printf("%s%s",$0,FNR>1?d RS:"cmpgn_group" RS)' $ORIG_FILE > $NEW_FILE

上述命令说明:

awk -v d="2017_01" -F"," ' ##Setting valur of variable named d as 2017_01 and setting field separator as comma.
BEGIN                     ##Starting BEGIN section of awk here.
  OFS = ","                ##Setting Output field separator as comma here.
                          ##Closing BEGIN block here.

  printf("%s%s",$0,FNR>1?d RS:"cmpgn_group" RS) ##Using printf here to print the lines. So %s%s means to print 2 strings here. First I am simply printing $0(current line). Then while printing second string using condition FNR>1(when line number is greater than 1) then print variable d(which we want to add at last) with RS(to print a new line here). Else(if condition FNR>1 is not true) then it means it is very first line of Input_file and print string "cmpn_groups" with RS(record separator) whose default value is a new line.

' $ORIG_FILE > $NEW_FILE   ##Mentioning Input_file named #ORIG_FILE and redirecting it's output to $NEW_FILE here.

【讨论】:

【参考方案3】:
awk -v d="2017_01" 'BEGINFS=OFS="," print $0, (NR>1?d:"cmpgn_group")' file

【讨论】:

以上是关于如何使用awk将带有标题的新列添加到csv的主要内容,如果未能解决你的问题,请参考以下文章

如何添加带有预测的新列?

如何将 CSV 导出复制到添加了新列的 Redshift 表中?

如何使用微软互操作将工作表开头的新列添加到现有 excel

如何将具有值的新列添加到现有数据表?

如何通过读取shell脚本中的csv文件来将2列的总和添加到新列中

如何在pyspark数据框中添加多个带有when条件的新列?