大数据--hive动态分区调整

Posted newtest00

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据--hive动态分区调整相关的知识,希望对你有一定的参考价值。

1、创建一张普通表加载数据

------------------------------------------------

hive (default)> create table person(id int,name string,location string)
> row format delimited fields terminated by ‘ ‘;
OK
Time taken: 0.415 seconds

-----------------------------------------------

hive (default)> load data local inpath ‘/root/hivetest/partition/stu‘ into table person;
Loading data to table default.person
Table default.person stats: [numFiles=1, totalSize=128]
OK
Time taken: 1.036 seconds

----------------------------------------------------

hive (default)> select * from person;
OK
person.id person.name person.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu shanghai
1004 heiliu shanghai
1005 xiaoliuzi zhejiang
1006 xiaohei zhejiang
Time taken: 0.356 seconds, Fetched: 6 row(s)

----------------------------------------------------

2、创建一张分区表加载数据

------------------------------------------------

hive (default)> create table person_partition1(id int,name string)
> partitioned by(location string)
> row format delimited fields terminated by ‘ ‘;
OK
Time taken: 0.055 seconds

----------------------------------------------------

hive (default)> load data local inpath ‘/root/hivetest/partition/stu_par‘ into table person_partition1 partition(location=‘jiangsu‘);
Loading data to table default.person_partition1 partition (location=jiangsu)
Partition default.person_partition1{location=jiangsu} stats: [numFiles=1, numRows=0, totalSize=48, rawDataSize=0]
OK
Time taken: 0.719 seconds

------------------------------------------------------

hive (default)> select * from person_partition1 where location = ‘jiangsu‘;
OK
person_partition1.id person_partition1.name person_partition1.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu jiangsu
1004 heiliu jiangsu
Time taken: 0.27 seconds, Fetched: 4 row(s)

-------------------------------------------------------------

3、创建一张目标分区表

--------------------------------------------------------------

hive (default)> create table target_partition(id int,name string)
> partitioned by(location string)
> row format delimited fields terminated by ‘ ‘;
OK
Time taken: 0.076 seconds

---------------------------------------------------------------

4、设置动态分区相关配置

-----------------------------------------------------------

(1)开启动态分区功能(默认true,开启)

hive (default)> set hive.exec.dynamic.partition;
hive.exec.dynamic.partition=true

(2)设置为非严格模式(动态分区的模式,默认strict,表示必须指定至少一个分区为静态分区,nonstrict模式表示允许所有的分区字段都可以使用动态分区。)

hive (default)> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=strict
hive (default)> set hive.exec.dynamic.partition.mode=nonstrict;
hive (default)> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=nonstrict

(3)在所有执行MR的节点上,最大一共可以创建多少个动态分区。(默认1000)

hive (default)> set hive.exec.max.dynamic.partitions;
hive.exec.max.dynamic.partitions=1000

(4)在每个执行MR的节点上,最大可以创建多少个动态分区。该参数需要根据实际的数据来设定。比如:源数据中包含了一年的数据,即day字段有365个值,那么该参数就需要设置成大于365,如果使用默认值100,则会报错。

hive (default)> set hive.exec.max.dynamic.partitions.pernode;
hive.exec.max.dynamic.partitions.pernode=100

(5)整个MR Job中,最大可以创建多少个HDFS文件。(默认值100000)

hive (default)> set hive.exec.max.created.files;
hive.exec.max.created.files=100000

(6)当有空分区生成时,是否抛出异常。一般不需要设置。(默认false)

hive (default)> set hive.error.on.empty.partition;
hive.error.on.empty.partition=false

-------------------------------------------------------------------------------------------------

5、原表是分区表person_partition1,查询加载另一张分区表target_partition

---------------------------------------------------------------------

hive (default)> insert overwrite table target_partition partition(location) select id,name,location from person_partition1;
Query ID = root_20191004121759_d0af4f33-c1aa-4ef8-93b7-836f260660be
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1570160651182_0001, Tracking URL = http://bigdata112:8088/proxy/application_1570160651182_0001/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1570160651182_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-04 12:18:10,599 Stage-1 map = 0%, reduce = 0%
2019-10-04 12:18:18,063 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.7 sec
MapReduce Total cumulative CPU time: 700 msec
Ended Job = job_1570160651182_0001
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/user/hive/warehouse/target_partition/.hive-staging_hive_2019-10-04_12-17-59_480_7824706292755053566-1/-ext-10000
Loading data to table default.target_partition partition (location=null)
Time taken for load dynamic partitions : 128
Loading partition {location=jiangsu}
Time taken for adding to write entity : 1
Partition default.target_partition{location=jiangsu} stats: [numFiles=1, numRows=4, totalSize=48, rawDataSize=44]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.7 sec HDFS Read: 4045 HDFS Write: 145 SUCCESS
Total MapReduce CPU Time Spent: 700 msec
OK
id name location
Time taken: 20.136 seconds

 

hive (default)> show partitions target_partition;
OK
partition
location=jiangsu
Time taken: 0.065 seconds, Fetched: 1 row(s)

 

hive (default)> select * from target_partition;
OK
target_partition.id target_partition.name target_partition.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu jiangsu
1004 heiliu jiangsu
Time taken: 0.12 seconds, Fetched: 4 row(s)

--------------------------------------------------------------------------

6、原表是普通表person,查询加载另一张分区表target_partition

----------------------------------------------------------------------------------------

hive (default)> insert overwrite table target_partition partition(location) select id,name,location from person;
Query ID = root_20191004122151_2c6376a5-b764-4ffd-be69-4f981c00b951
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there‘s no reduce operator
Starting Job = job_1570160651182_0002, Tracking URL = http://bigdata112:8088/proxy/application_1570160651182_0002/
Kill Command = /opt/module/hadoop-2.8.4/bin/hadoop job -kill job_1570160651182_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2019-10-04 12:21:59,322 Stage-1 map = 0%, reduce = 0%
2019-10-04 12:22:05,702 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 0.75 sec
MapReduce Total cumulative CPU time: 750 msec
Ended Job = job_1570160651182_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://mycluster/user/hive/warehouse/target_partition/.hive-staging_hive_2019-10-04_12-21-51_068_5031819755791838456-1/-ext-10000
Loading data to table default.target_partition partition (location=null)
Time taken for load dynamic partitions : 357
Loading partition {location=zhejiang}
Loading partition {location=shanghai}
Loading partition {location=jiangsu}
Time taken for adding to write entity : 0
Partition default.target_partition{location=jiangsu} stats: [numFiles=1, numRows=2, totalSize=24, rawDataSize=22]
Partition default.target_partition{location=shanghai} stats: [numFiles=1, numRows=2, totalSize=24, rawDataSize=22]
Partition default.target_partition{location=zhejiang} stats: [numFiles=1, numRows=2, totalSize=28, rawDataSize=26]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Cumulative CPU: 0.75 sec HDFS Read: 3817 HDFS Write: 295 SUCCESS
Total MapReduce CPU Time Spent: 750 msec
OK
id name location
Time taken: 17.561 seconds

 

hive (default)> show partitions target_partition;
OK
partition
location=jiangsu
location=shanghai
location=zhejiang
Time taken: 0.046 seconds, Fetched: 3 row(s)

 

hive (default)> select * from target_partition;
OK
target_partition.id target_partition.name target_partition.location
1001 zhangsan jiangsu
1002 lisi jiangsu
1003 wangwu shanghai
1004 heiliu shanghai
1005 xiaoliuzi zhejiang
1006 xiaohei zhejiang
Time taken: 0.112 seconds, Fetched: 6 row(s)

以上是关于大数据--hive动态分区调整的主要内容,如果未能解决你的问题,请参考以下文章

大数据组件之Hive 分区表

大数据跟我学系列四 | Hive分区表实战

Hive动态分区

Hive表的动态分区和静态分区

Hive 动态分区

hive分区表