使用bash shell脚本从文件中查找和提取特定字符串后的值?
Posted
技术标签:
【中文标题】使用bash shell脚本从文件中查找和提取特定字符串后的值?【英文标题】:Find and Extract value after specific String from a file using bash shell script? 【发布时间】:2020-08-04 17:46:18 【问题描述】:我有一个包含以下详细信息的文件: 文件.txt
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `dv.par_kst`( |
| `col1` string, |
| `col2` string, |
| `col3` int, |
| `col4` int, |
| `col5` string, |
| `col6` float, |
| `col7` int, |
| `col8` string, |
| `col9` string, |
| `col10` int, |
| `col11` int, |
| `col12` string, |
| `col13` float, |
| `col14` string, |
| `col15` string) |
| PARTITIONED BY ( |
| `part_col1` int, |
| `part_col2` int) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' |
| LOCATION |
| 'hdfs://nameservicets1/dv/hdfsdata/par_kst' |
| TBLPROPERTIES ( |
| 'spark.sql.create.version'='2.2 or prior', |
| 'spark.sql.sources.schema.numPartCols'='2', |
| 'spark.sql.sources.schema.numParts'='1', |
| 'spark.sql.sources.schema.part.0'='"type":"struct","fields":["name":"col1","type":"string","nullable":true,"metadata":,"name":"col2","type":"string","nullable":true,"metadata":,"name":"col3","type":"integer","nullable":true,"metadata":,"name":"col4","type":"integer","nullable":true,"metadata":,"name":"col5","type":"string","nullable":true,"metadata":,"name":"col6","type":"float","nullable":true,"metadata":,"name":"col7","type":"integer","nullable":true,"metadata":,"name":"col8","type":"string","nullable":true,"metadata":,"name":"col9","type":"string","nullable":true,"metadata":,"name":"col10","type":"integer","nullable":true,"metadata":,"name":"col11","type":"integer","nullable":true,"metadata":,"name":"col12","type":"string","nullable":true,"metadata":,"name":"col13","type":"float","nullable":true,"metadata":,"name":"col14","type":"string","nullable":true,"metadata":,"name":"col15","type":"string","nullable":true,"metadata":,"name":"part_col1","type":"integer","nullable":true,"metadata":,"name":"part_col2","type":"integer","nullable":true,"metadata":]', |
| 'spark.sql.sources.schema.partCol.0'='part_col1', |
| 'spark.sql.sources.schema.partCol.1'='part_col2', |
| 'transient_lastDdlTime'='1587487456') |
+----------------------------------------------------+
我想从上面的文件中提取 PARTITIONED BY 详细信息。
Desired output :
part_col1 , part_col2
这些 PARTITIONED BY 不是固定的,意味着对于其他一些文件它可能包含 3 个或更多,所以我想提取所有 PARTITIONED BY。
PARTITIONED BY 和 ROW FORMAT SERDE 之间的所有值,删除空格“`”和数据类型!
您能帮我解决这个问题吗?
【问题讨论】:
【参考方案1】:sed -nr '/PARTITIONED BY/,/ROW FORMAT SERDE/p' a.txt|sed -nr '/`/p'|cut -d '`' -f 2|xargs -n 1 echo -n " "
【讨论】:
并且不是在 file.txt 中有记录,我必须执行如下: par_col=beeline --silent -u "$BEELINE_URL" -e "$sql"
where sql="show create table dvs_wk.par_kst" Par_col 有上述结果但是当我像这样:result=sed -n '/PARTITIONED BY/,/ROW FORMAT SERDE/p' $par_col | sed -n '//p'|cut -d '
' -f 2|xargs -n 1 echo -n " " 它给了我一个错误。
sed 打印 PARTITIONED BY 和 ROW FORMAT SERDE 之间的所有字符串(包括它们),然后另一个 sed 仅打印带有“" character, than cut command split string in column by "
”的字符串并打印第二列(您的号码),然后 xargs 抓取所有数字并用空格作为分隔符打印它们。可能不是最好的管道,但它适用于您的示例。【参考方案2】:
my $text = do local $/; <DATA> ;
my @partitioned = ();
$text=~s#PARTITIONED BY\s*\(([^\(\)]*)\)# my $fulcontent=$1;
push (@partitioned, $1) while($fulcontent=~m/\`([^\`]+)\`/g);
($fulcontent);
#egs;
print join "\, ", @partitioned;
输出:
part_col1,part_col2
【讨论】:
【参考方案3】:当您的结果布局无关紧要时,您可以要求sed
考虑开始和结束标记之间的行,并且仅在可以在 2 个反引号之间找到字段时打印这样的行。
sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1/p' file.txt
可以根据需要将结果组合成一行
printf "%s , " $(sed -rn '/PARTITIONED BY/,/ROW FORMAT/s/.*`(.*)`.*/\1 /p' file.txt) |
sed 's/ , $/\n/'
【讨论】:
【参考方案4】:小perl脚本
将整个文件读入$data
变量
在PARTITIONED BY (....)
之间全选
仅选择 ` 之间的元素进入数组
打印结果加入,
use strict;
use warnings;
use feature 'say';
my $data = do local $/; <> ;
my $re = 'PARTITIONED BY \((.*?)\)';
$data =~ /$re/sg;
my @part = $1 =~ /`(.*?)`/sg;
say join ', ', @part;
【讨论】:
以上是关于使用bash shell脚本从文件中查找和提取特定字符串后的值?的主要内容,如果未能解决你的问题,请参考以下文章
从 shell 脚本 (bash) 的参数列表中删除最后一个参数