如何在“PARTITIONED BY”子句的括号之间以逗号分隔的tick中提取值
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在“PARTITIONED BY”子句的括号之间以逗号分隔的tick中提取值相关的知识,希望对你有一定的参考价值。
我有shell脚本,它为数据库中的所有表提取create table语句的语法。我一次循环一个create table语句,create table语句将作为循环中的变量$ DATA。我需要在partitioned by子句中的create table语句中提取列。
例如,$ DATA是循环中的变量
向循环输入迭代1:
DATA="CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100)) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')"
迭代1的输出:dataoutput = depth,permi
向循环输入迭代2:
DATA="CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')"
迭代2的输出:dataoutput = depth
向循环输入迭代3:
DATA="CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100), `www` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')"
迭代3的输出:dataoutput = depth,permi,www
答案
试试这个:
my @bcktik = "";
while(<DATA>)
{
if($_=~m/PARTITIONED BYs*(((?:(.*)|[^(])*))/i)
{
push(@bcktik, join ",", ($1=~m/`([^`]*)`/g));
}
}
print "$_
" for @bcktik;
__DATA__
CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100)) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')
CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')
CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100), `www` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')
以上是关于如何在“PARTITIONED BY”子句的括号之间以逗号分隔的tick中提取值的主要内容,如果未能解决你的问题,请参考以下文章
Hive PARTITIONED BY,列表索引超出范围错误?