如何在“PARTITIONED BY”子句的括号之间以逗号分隔的tick中提取值

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何在“PARTITIONED BY”子句的括号之间以逗号分隔的tick中提取值相关的知识,希望对你有一定的参考价值。

我有shell脚本,它为数据库中的所有表提取create table语句的语法。我一次循环一个create table语句,create table语句将作为循环中的变量$ DATA。我需要在partitioned by子句中的create table语句中提取列。

例如,$ DATA是循环中的变量

向循环输入迭代1:

DATA="CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100)) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')"

迭代1的输出:dataoutput = depth,permi

向循环输入迭代2:

DATA="CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')"

迭代2的输出:dataoutput = depth

向循环输入迭代3:

DATA="CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100), `www` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')"

迭代3的输出:dataoutput = depth,permi,www

答案

试试这个:

my @bcktik = "";
while(<DATA>)
{
    if($_=~m/PARTITIONED BYs*(((?:(.*)|[^(])*))/i)
    {
        push(@bcktik, join ",", ($1=~m/`([^`]*)`/g));
    }
}
print "$_
" for @bcktik;

__DATA__
CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100)) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')

CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')

CREATE TABLE `xxx`( `path` varchar(200), `fsize` bigint, `usrname` varchar(100)) PARTITIONED BY ( `depth` int, `permi` varchar(100), `www` int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 'xxx' TBLPROPERTIES ( 'transient_lastDdlTime'='1519784177')

以上是关于如何在“PARTITIONED BY”子句的括号之间以逗号分隔的tick中提取值的主要内容,如果未能解决你的问题,请参考以下文章

Hive PARTITIONED BY,列表索引超出范围错误?

hive创建表的时候没有partitioned by能否在创表之后添加分区

Partitioned by 给我创建外部表时重复的错误列

hive查询分区元数据,PARTITIONED BY

如何在 LIKE 子句中转义方括号?

T-SQL - 如何在LIKE子句中转义斜杠/方括号