awk中用户定义函数的打印输出给出了意外的令牌错误
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了awk中用户定义函数的打印输出给出了意外的令牌错误相关的知识,希望对你有一定的参考价值。
我想灵活地将两个小awk的输出打印到bash管道,这些管道使用变量(它们最初工作)。我最初认为我可以将整个命令存储为变量本身,但对于一个它不起作用,显然(store awk command in a variable of bash script)它不是一个好主意。所以我写了两个函数,但是我在“完成”附近得到一个“意外的令牌”,但它的格式如上面的链接。
我的错误在哪里?
for coverage_file in */*.cov
do
#gene_count=$(awk '{print $5}' $coverage_file |sort | uniq -c | wc -l) #this is apparently not a good idea
#contig_count=$(awk '{print $1}' $coverage_file |sort | uniq -c | wc -l) #this is apparently not a good idea
cmd_gene() { awk '{print $5}' $coverage_file |sort | uniq -c | wc -l }
cmd_contig() { awk '{print $1}' $coverage_file |sort | uniq -c | wc -l }
cmd_gene $coverage_file
cmd_contig $coverage_file
#print "we found", $gene_count, "genes on ",$contig_count" contigs
done
cov文件看起来像这样:
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_88684 267 5 B10 phnM_032
k141_88684 268 5 B10 phnM_032
k141_88684 269 5 B10 phnM_032
k141_88684 270 5 B10 phnM_032
k141_88684 271 5 B10 phnM_032
k141_88684 272 5 B10 phnM_032
编辑:这包括接受的答案+明确打印的可能方式:
#!/bin/bash
#define variables
gene="phnM"
threshold="5"
#define functions
cmd_gene() { awk '{print $5}' $1 |sort | uniq -c | wc -l ; } #semicolon is important here!
cmd_contig() { awk '{print $1}' $1 |sort | uniq -c | wc -l ; } #semicolon is important here!
#loop over files and print results (would be prettier with printf)
for coverage_file in */*.cov
do
echo $gene" was found" $(cmd_gene "$coverage_file") "times on" $(cmd_contig "$coverage_file")" contigs with minimum coverage of" $threshold in $coverage_file
done
OUTPUT:
phnM was found 67 times on 65 contigs with minimum coverage of 5 in phnm/test.cov
phnM was found 3 times on 2 contigs with minimum coverage of 5 in test/test.cov
意外的令牌错误即将发生,因为当你定义一个函数时,}必须在它自己的行上或前面有;。
此外,由于您在函数的定义中使用$coverage_file
,因此您不必传递它。
for coverage_file in */*.cov
do
cmd_gene() { awk '{print $5}' $coverage_file |sort | uniq -c | wc -l; }
cmd_contig() { awk '{print $1}' $coverage_file |sort | uniq -c | wc -l; }
cmd_gene
cmd_contig
#print "we found", $gene_count, "genes on ",$contig_count" contigs
done
如果你想在for循环之外定义函数,你可以使用$1
(不要与awk的$ 1混淆)并像之前那样传递$coverage_file
。
编辑:上面的例子
$ cat a.sh
cmd_gene() { awk '{print $5}' $1 |sort | uniq -c | wc -l; }
cmd_contig() { awk '{print $1}' $1 |sort | uniq -c | wc -l; }
for coverage_file in */*.cov
do
cmd_gene $coverage_file
cmd_contig $coverage_file
done
$ ls */*.cov
bf/a.cov
$ cat */*.cov
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 19 A5 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 28 A1 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_85332.3 4119 8 A2 phnM_031
k141_88684 267 5 B10 phnM_032
k141_88684 268 5 B10 phnM_032
k141_88684 269 5 B10 phnM_032
k141_88684 270 5 B10 phnM_032
k141_88684 271 5 B10 phnM_032
k141_88684 272 5 B10 phnM_032
$ sh a.sh
2
2
@jas回答了你的问题,所以坚持下去,以下只是一个通常更好的方法来做你想要做的那个太大/格式化以适合评论:
awk '
BEGIN {
gene = "phnM"
threshold = "5"
}
{
genes[$5]
contigs[$1]
}
ENDFILE {
printf "%s was found %d times on %d contigs with minimum coverage of %d in %s
",
gene, length(genes), length(contigs), threshold, FILENAME
delete genes
delete contigs
}
' */*.cov
以上使用GNU awk进行ENDFILE,但如果需要,它可以使其适用于其他awk:
awk '
BEGIN {
gene = "phnM"
threshold = "5"
}
FNR==1 { prt() }
{
genes[$5]
contigs[$1]
}
END { prt() }
function prt() {
if (fname != "") {
printf "%s was found %d times on %d contigs with minimum coverage of %d in %s
",
gene, length(genes), length(contigs), threshold, fname
delete genes
delete contigs
}
fname = FILENAME
}
' */*.cov
有关在操作文本时避免shell循环的一些原因,请参阅https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice。
以上是关于awk中用户定义函数的打印输出给出了意外的令牌错误的主要内容,如果未能解决你的问题,请参考以下文章
HQL:选择语句以及使用“case when then”给出意外的令牌错误
node.js 中未定义 JSONP,使用 .send 会导致意外令牌错误