如何使用awk根据数值范围向列添加特定值

Question

我正在尝试根据我的coverage_file中的数字向我的文件bed_file添加一列。在我的coverage_file中，我在第二列中有位置，bed_file包含第二列到第三列的位置范围以及第4列中的名称。我想在coverage_file和bed_file范围内为#example data #coverage file looks like: #k141_xxx.xx are contigs (long sequences of DNA), where different genes can be found on. #the second column is the current position on the individual contig #the third column is the coverage on this position (not important here) #the fourth column is the sample where the data comes from: A1..7 and B8..10 k141_102288 298 5 A4 k141_102288 298 5 A5 k141_102288 298 5 B8 k141_102288 298 5 B9 k141_102288 299 5 A4 k141_102288 299 5 A5 k141_102288 299 5 B9 k141_102288 300 5 A5 k141_102288 301 5 A5 k141_102511.0 8226 5 A5 k141_102511.0 8227 5 A5 k141_102511.0 8228 5 A5 k141_102511.0 8229 5 A5 k141_102511.0 8230 5 A5 k141_102511.0 8231 5 A5 k141_102511.0 8232 5 A5 k141_102511.0 8233 5 A5 k141_102511.0 8234 5 A5 k141_102511.0 9129 5 A6 k141_102511.0 9207 5 A6 k141_102511.0 9275 5 A7 k141_102511.0 9276 5 A7 k141_102511.0 9277 5 A7 k141_102511.0 9278 5 A7 k141_102511.0 9279 5 A7 k141_102511.0 9280 5 A7 k141_102511.0 9281 5 A7 k141_102511.0 9282 5 A7添加每个位置的相应名称。也有编号，所以我可以区分同一对象（重叠群）的多个位置范围。希望我的示例数据更清晰：

#bed file looks like this
# the bed file shows the start $2 and end $3 position of a gene $4 on the contigs $1
k141_102288 2   301 phnE
k141_102511.0   7890    8807    phnE
k141_102511.0   8814    10400   phnE

#proposed output (note the two different regions of phnE on k141_102511.0)
k141_102288 298 5 A4    phnE_001
k141_102288 298 5 A5    phnE_001
k141_102288 298 5 B8    phnE_001
k141_102288 298 5 B9    phnE_001
k141_102288 299 5 A4    phnE_001
k141_102288 299 5 A5    phnE_001
k141_102288 299 5 B9    phnE_001
k141_102288 300 5 A5    phnE_001
k141_102288 301 5 A5    phnE_001
k141_102511.0 8226 5 A5 phnE_002
k141_102511.0 8227 5 A5 phnE_002
k141_102511.0 8228 5 A5 phnE_002
k141_102511.0 8229 5 A5 phnE_002
k141_102511.0 8230 5 A5 phnE_002
k141_102511.0 8231 5 A5 phnE_002
k141_102511.0 8232 5 A5 phnE_002
k141_102511.0 8233 5 A5 phnE_002
k141_102511.0 8234 5 A5 phnE_002
k141_102511.0 9129 5 A6 phnE_003
k141_102511.0 9207 5 A6 phnE_003
k141_102511.0 9275 5 A7 phnE_003
k141_102511.0 9276 5 A7 phnE_003
k141_102511.0 9277 5 A7 phnE_003
k141_102511.0 9278 5 A7 phnE_003
k141_102511.0 9279 5 A7 phnE_003
k141_102511.0 9280 5 A7 phnE_003
k141_102511.0 9281 5 A7 phnE_003
k141_102511.0 9282 5 A7 phnE_003

How to use info on substring position from one file to extract substring from another file (loop, bash)

我试图利用我以前的类似问题，但仍然无法弄清楚如何使它工作：#!bin/bash # We are reading two files: coverage_file.txt and intersect.bed # NR is equal to FNR as long as we are reading the # first file. # Store the positions in an array current_position from the coverage file (indexed by $1) # go to bed file # store the start and end positions and the gene names in similar arrays # if current_position is between start_pos and end_pos, print additionally gene name awk 'NR==FNR{current_position[$1]=$2} NR==FNR{next} {start_pos[$1]=$2;end_pos[$1]=$3;gene_name[$1]=$4} {if(current_position[$1] >= start_pos[$1]) && (current_position[$1] <= `end_pos[$1]){ print $1,$2,$3,$4,gene_name[$1]}}' coverage_file.txt intersect.bed > test.txt`

有什么建议？编辑：我试着去建议没有。 2 by @ Nic3500，但我无法让它运行。我在最后一行有一个意外的令牌。这是我到目前为止提出的：

awk