大数据--hive动态分区调整
Posted newtest00
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据--hive动态分区调整相关的知识,希望对你有一定的参考价值。
1、创建一张普通表加载数据
------------------------------------------------
hive (default)> create table person(id int,name string,location string)
> row format delimited fields terminated by ‘ ‘;
OK
Time taken: 0.415 seconds
-----------------------------------------------
hive (default)> load data local inpath ‘/root/hivetest/partition/stu‘ into table person;
Loading data to table default.person
Table default.person stats: [numFiles=1, totalSize=128]
OK
Time taken: 1.036 seconds
----------------------------------------------------
hive (default)> select * from person;
OK
person.id person.name person.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu shanghai
1004 heiliu shanghai
1005 xiaoliuzi zhejiang
1006 xiaohei zhejiang
Time taken: 0.356 seconds, Fetched: 6 row(s)
----------------------------------------------------
2、创建一张分区表加载数据
------------------------------------------------
hive (default)> create table person_partition1(id int,name string)
> partitioned by(location string)
> row format delimited fields terminated by ‘ ‘;
OK
Time taken: 0.055 seconds
----------------------------------------------------
hive (default)> load data local inpath ‘/root/hivetest/partition/stu_par‘ into table person_partition1 partition(location=‘jiangsu‘);
Loading data to table default.person_partition1 partition (location=jiangsu)
Partition default.person_partition1{location=jiangsu} stats: [numFiles=1, numRows=0, totalSize=48, rawDataSize=0]
OK
Time taken: 0.719 seconds
------------------------------------------------------
hive (default)> select * from person_partition1 where location = ‘jiangsu‘;
OK
person_partition1.id person_partition1.name person_partition1.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu jiangsu
1004 heiliu jiangsu
Time taken: 0.27 seconds, Fetched: 4 row(s)
-------------------------------------------------------------
3、创建一张目标分区表
--------------------------------------------------------------
hive (default)> create table target_partition(id int,name string)
> partitioned by(location string)
> row format delimited fields terminated by ‘ ‘;
OK
Time taken: 0.076 seconds
---------------------------------------------------------------
4、设置动态分区相关配置
-----------------------------------------------------------
(1)开启动态分区功能(默认true,开启)
hive (default)> set hive.exec.dynamic.partition;
hive.exec.dynamic.partition=true
(2)设置为非严格模式(动态分区的模式,默认strict,表示必须指定至少一个分区为静态分区,nonstrict模式表示允许所有的分区字段都可以使用动态分区。)
hive (default)> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=strict
hive (default)> set hive.exec.dynamic.partition.mode=nonstrict;
hive (default)> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=nonstrict
(3)在所有执行MR的节点上,最大一共可以创建多少个动态分区。(默认1000)
hive (default)> set hive.exec.max.dynamic.partitions;
hive.exec.max.dynamic.partitions=1000
(4)在每个执行MR的节点上,最大可以创建多少个动态分区。该参数需要根据实际的数据来设定。比如:源数据中包含了一年的数据,即day字段有365个值,那么该参数就需要设置成大于365,如果使用默认值100,则会报错。
hive (default)> set hive.exec.max.dynamic.partitions.pernode;
hive.exec.max.dynamic.partitions.pernode=100
(5)整个MR Job中,最大可以创建多少个HDFS文件。(默认值100000)
hive (default)> set hive.exec.max.created.files;
hive.exec.max.created.files=100000
(6)当有空分区生成时,是否抛出异常。一般不需要设置。(默认false)
hive (default)> set hive.error.on.empty.partition;
hive.error.on.empty.partition=false
-------------------------------------------------------------------------------------------------
5、原表是分区表person_partition1,查询加载另一张分区表target_partition
---------------------------------------------------------------------
hive (default)> insert overwrite table target_partition partition(location) select id,name,location from person_partition1;
Query ID = root_20191004121759_d0af4f33-c1aa-4ef8-93b7-836f260660be
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1570160651182_0001, Tracking URL = http://bigdata112:8088/proxy/application_1570160651182_0001/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1570160651182_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-04 12:18:10,599 Stage-1 map = 0%, reduce = 0%
2019-10-04 12:18:18,063 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.7 sec
MapReduce Total cumulative CPU time: 700 msec
Ended Job = job_1570160651182_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/user/hive/warehouse/target_partition/.hive-staging_hive_2019-10-04_12-17-59_480_7824706292755053566-1/-ext-10000
Loading data to table default.target_partition partition (location=null)
Time taken for load dynamic partitions : 128
Loading partition {location=jiangsu}
Time taken for adding to write entity : 1
Partition default.target_partition{location=jiangsu} stats: [numFiles=1, numRows=4, totalSize=48, rawDataSize=44]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.7 sec HDFS Read: 4045 HDFS Write: 145 SUCCESS
Total MapReduce CPU Time Spent: 700 msec
OK
id name location
Time taken: 20.136 seconds
hive (default)> show partitions target_partition;
OK
partition
location=jiangsu
Time taken: 0.065 seconds, Fetched: 1 row(s)
hive (default)> select * from target_partition;
OK
target_partition.id target_partition.name target_partition.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu jiangsu
1004 heiliu jiangsu
Time taken: 0.12 seconds, Fetched: 4 row(s)
--------------------------------------------------------------------------
6、原表是普通表person,查询加载另一张分区表target_partition
----------------------------------------------------------------------------------------
hive (default)> insert overwrite table target_partition partition(location) select id,name,location from person;
Query ID = root_20191004122151_2c6376a5-b764-4ffd-be69-4f981c00b951
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1570160651182_0002, Tracking URL = http://bigdata112:8088/proxy/application_1570160651182_0002/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1570160651182_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-04 12:21:59,322 Stage-1 map = 0%, reduce = 0%
2019-10-04 12:22:05,702 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec
MapReduce Total cumulative CPU time: 750 msec
Ended Job = job_1570160651182_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/user/hive/warehouse/target_partition/.hive-staging_hive_2019-10-04_12-21-51_068_5031819755791838456-1/-ext-10000
Loading data to table default.target_partition partition (location=null)
Time taken for load dynamic partitions : 357
Loading partition {location=zhejiang}
Loading partition {location=shanghai}
Loading partition {location=jiangsu}
Time taken for adding to write entity : 0
Partition default.target_partition{location=jiangsu} stats: [numFiles=1, numRows=2, totalSize=24, rawDataSize=22]
Partition default.target_partition{location=shanghai} stats: [numFiles=1, numRows=2, totalSize=24, rawDataSize=22]
Partition default.target_partition{location=zhejiang} stats: [numFiles=1, numRows=2, totalSize=28, rawDataSize=26]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.75 sec HDFS Read: 3817 HDFS Write: 295 SUCCESS
Total MapReduce CPU Time Spent: 750 msec
OK
id name location
Time taken: 17.561 seconds
hive (default)> show partitions target_partition;
OK
partition
location=jiangsu
location=shanghai
location=zhejiang
Time taken: 0.046 seconds, Fetched: 3 row(s)
hive (default)> select * from target_partition;
OK
target_partition.id target_partition.name target_partition.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu shanghai
1004 heiliu shanghai
1005 xiaoliuzi zhejiang
1006 xiaohei zhejiang
Time taken: 0.112 seconds, Fetched: 6 row(s)
以上是关于大数据--hive动态分区调整的主要内容,如果未能解决你的问题,请参考以下文章