bash 使用散列键从 CSV 多列中读取

Posted

技术标签:

【中文标题】bash 使用散列键从 CSV 多列中读取【英文标题】:bash read from CSV multiple columns whith hash key 【发布时间】:2022-01-07 21:24:55 【问题描述】:

我尝试如下垂直读取 csv 文件以插入石墨/碳 DB。

"No.","time","00:00:00","00:00:01","00:00:02","00:00:03","00:00:04","00:00:05","00:00:06","00:00:07","00:00:08","00:00:09","00:00:0A"
"1","2021/09/12 02:16",235,610,345,997,446,130,129,94,555,274,4
"2","2021/09/12 02:17",364,210,371,341,294,87,179,106,425,262,3
"3","2021/09/12 02:18",297,343,860,216,275,81,73,113,566,274,3
"4","2021/09/12 02:19",305,243,448,262,387,64,63,119,633,249,3
"5","2021/09/12 02:20",276,151,164,263,315,86,92,175,591,291,1
"6","2021/09/12 02:21",264,343,287,542,312,83,72,122,630,273,4
"7","2021/09/12 02:22",373,157,266,446,246,90,173,90,442,273,2
"8","2021/09/12 02:23",265,112,241,307,329,64,71,82,515,260,3
"9","2021/09/12 02:24",285,247,240,372,176,92,67,83,609,620,1
"10","2021/09/12 02:25",289,964,277,476,356,84,74,104,560,294,1
"11","2021/09/12 02:26",279,747,227,573,569,82,77,99,589,229,5
"12","2021/09/12 02:27",338,370,315,439,653,85,165,346,367,281,2
"13","2021/09/12 02:28",269,135,372,262,307,73,86,93,512,283,4
"14","2021/09/12 02:29",281,207,688,322,233,75,69,85,663,276,2
...

我希望为每个列标题 00:00:XX 生成命令,同时考虑到第 2 列中的小时以及这段时间内的值

echo "perf.$type.$serial.$object.00:00:00.TOTAL_IOPS" "235" "epoch time (2021/09/12 02:16)" | nc "localhost" "2004"

echo "perf.$type.$serial.$object.00:00:00.TOTAL_IOPS" "364" "epoch time (2021/09/12 02:17)" | nc "localhost" "2004"

...

echo "perf.$type.$serial.$object.00:00:01.TOTAL_IOPS" "610" "epoch time (2021/09/12 02:16)" | nc "localhost" "2004"

echo "perf.$type.$serial.$object.00:00:01.TOTAL_IOPS" "210" "epoch time (2021/09/12 02:17)" | nc "localhost" "2004"

.. etc..

我不知道从哪个开始,我用awk试过没有成功

Trial1:  awk -F "," 'BEGINFS=","NR==1for(i=1;i<=NF;i++) header[i]=$ifor(i=1;i<=NF;i++)  print header[i]   ' file.csv

Trial2:  awk 'time=$2; for(i=3;i<=NF;i++)time=time" "$i; print time' file.csv

非常感谢您的帮助。

【问题讨论】:

请发布相关的预期输出。不要以评论、图像、表格或非现场服务的链接的形式发布,而是使用文本并将其包含在您的原始问题中。谢谢。 【参考方案1】:

在普通的 bash 中:

#!/bin/bash


    IFS=',' read -ra header
    header=("$header[@]//\"")
    nf=$#header[@]
    row_nr=0
    while IFS=',' read -ra flds; do
        datetime[row_nr++]=$(date -d "$flds[1]//\"" '+%s')
        for ((i = 2; i < nf; ++i)); do
            col[i]+=" $flds[i]"
        done
    done
 < file

for ((i = 2; i < nf; ++i)); do
    v=($col[i])
    for ((j = 0; j < row_nr; ++j)); do
        printf 'echo "perf.$type.$serial.$object.%s.TOTAL_IOPS" "%s" "epoch time (%s)" | nc "localhost" "2004"\n' \
            "$header[i]" "$v[j]" "$datetime[j]"
    done
done

【讨论】:

非常感谢,它的作品。但是我怎样才能将日期“2021/09/12 02:25”转换为像 date -d '2021/09/12 02:16' +“%s”这样的纪元 @Indi59 我已编辑答案以反映此要求。 我刚刚看到,它完美,它有效,非常感谢 Nejat【参考方案2】:

请您尝试以下方法:

awk -F, '
    NR==1                                      # process the header line
        for (i = 3; i <= NF; i++) 
            gsub(/"/, "", $i)                   # remove double quotes
            tt[i-2] = $i                        # assign time array
        
        next
    
                                               # process the body
        gsub(/"/, "", $0)
        dt[NR - 1] = $2                         # assign datetime array
        for (i = 3; i <= NF; i++) 
            key[NR-1, i-2] = $i                 # assign key values
        
    
    END 
        for (i = 1; i <= NF - 2; i++) 
            for (j = 1; j <= NR - 1; j++) 
                printf "echo \"perf.$type.$serial.$object.%s.TOTAL_IOPS\" \"%d\" \"epoch time (%s)\" | nc \"localhost\" \"2004\"\n", tt[i], key[j, i], dt[j]
            
        
    
' file.csv

【讨论】:

非常感谢它也有效,我无法检查您的答案是否已解决,但 awk 代码也可以帮助我! 我将此代码保存在我的库中,非常感谢 Tshiono!

以上是关于bash 使用散列键从 CSV 多列中读取的主要内容,如果未能解决你的问题,请参考以下文章

新手python怎么从Excel中读取多行多列画并列柱状图?

如何使用bash脚本从csv文件中读取特定的整数?

Perl - 使用字符串代替散列键

在 spark 中读取单行 json,其中列键是可变的

如何处理 bash 脚本读取的 CSV 文件中的逗号

在bash中管理csv文件中读取动态字段的Unix权限