Hive - 面临动态分区错误中的挑战
Posted
技术标签:
【中文标题】Hive - 面临动态分区错误中的挑战【英文标题】:Hive - facing challenge's in Dynamic partition error 【发布时间】:2014-08-26 19:46:10 【问题描述】:谁能指导我在做动态分区时哪里出错了。
--暂存表:
create table staging_peopledata
(
firstname string,
secondname string,
salary float,
country string
state string
)
row format delimited fields terminated by ',' lines terminated by '\n';
--暂存表的数据:
John,David,30000,RUS,tnRUS
John,David,30000,RUS,tnRUS
Mary,David,5000,AUS,syAUS
Mary,David,5000,AUS,syAUS
Mary,David,5000,AUS,weAUS
Pierre,Cathey,6000,RUS,kaRUS
Pierre,Cathey,6000,RUS,kaRUS
Ahmed,Talib,11000,US,bcUS
Ahmed,Talib,11000,US,onUS
Ahmed,Talib,11000,US,onUS
kris,David,80000,UK,lnUK
kris,David,80000,UK,soUK
--生产表:
create table Production_peopledata
(
firstname string,
lastname string,
salary float)
partitioned by (country string, state string)
row format delimited fields terminated by ',' lines terminated by '\n';
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table Production_peopledata
partition(country,state)
select firstname, secondname, salary, country, state from staging_peopledata;
如果我执行上述命令,我会收到如下错误。
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode
requires atleast one static partition column. To turn this off set
hive.exec.dynamic.partition.mode=nonstrict
谁能告诉我哪里出错了。
【问题讨论】:
错误信息非常明确。谷歌搜索“Hive 动态分区”怎么样?或者阅读 Hive 教程,“动态分区”一节?cwiki.apache.org/confluence/display/Hive/Tutorial 【参考方案1】:您能否在 Hive Shell 上运行以下命令。
hive>set hive.exec.dynamic.partition.mode=nonstrict;
【讨论】:
【参考方案2】:您需要设置以下属性:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
要分区的列名不应该是表定义的一部分。由于分区列是动态生成的。在分区表中填充数据时,分区列应该来自源表。
假设我们有 EMP
和 EMP1
表。 EMP1
是分区表,它将从 EMP
表中获取数据。最初这两个表是相同的。所以首先我们需要创建一个分区列,即salpart
。然后我们将在源表中添加这一列EMP
。成功运行后,我们可以在 user/hive/warehouse 位置看到分区文件。上面的解释实现如下:
load data local inpath '/home/cloudera/myemployeedata.txt' overwrite into table emp;
CREATE TABLE IF NOT EXISTS emp ( eid int, name String,
salary String, destination String,salpart string)
COMMENT "Employee details"
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "\t"
LINES TERMINATED BY "\n"
STORED AS TEXTFILE;
CREATE TABLE IF NOT EXISTS emp1 ( eid int, name String,
salary String, destination String)
COMMENT "Employee details"
partitioned by (salpart string) this column will values will come from a seperate table
ROW FORMAT DELIMITED
FIELDS TERMINATED BY "\t"
LINES TERMINATED BY "\n"
STORED AS TEXTFILE;
Dynamic Partition:
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;
insert overwrite table emp1 partition(salpart) select eid,name,salary,destination,salpart from emp;
【讨论】:
【参考方案3】:根据错误,模式似乎仍然严格,对于动态分区,它需要设置为非严格 使用下面的命令
hive>设置 hive.exec.dynamic.partition.mode=nonstrict;
【讨论】:
【参考方案4】:再次尝试做 设置 hive.exec.dynamic.partition.mode=nonstrict 有时在 hive 中即使您设置此属性也会发生它认为严格模式因此我建议您再次设置此属性
【讨论】:
以上是关于Hive - 面临动态分区错误中的挑战的主要内容,如果未能解决你的问题,请参考以下文章