hive on tez
Posted yjt1993
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hive on tez相关的知识,希望对你有一定的参考价值。
hive运行模式
- hive on mapreduce 离线计算(默认)
- hive on tez YARN之上支持DAG作业的计算框架
- hive on spark 内存计算
hive on tez
Tez是一个构建于YARN之上的支持复杂的DAG任务的数据处理框架。它由Hontonworks开源,它把mapreduce的过程拆分成若干个子过程,同时可以把多个mapreduce任务组合成一个较大的DAG任务,减少了mapreduce之间的文件存储,同时合理组合其子过程从而大幅提升MapReduce作业的性能。
安装tez
tez的安装有源码安装和二进制包安装,这里使用二进制包安装。
hadoop版本:2.9.1
hive版本:2.1.1
tez版本:0.9.0
前提:hadoop环境已经搭建好,包括yarn(tez需要运行在yarn上)、hive
下载
wget http://mirror.bit.edu.cn/apache/tez/0.9.0/apache-tez-0.9.0-bin.tar.gz
安装
# tar zxvf apache-tez-0.9.0-bin.tar.gz
# mv apache-tez-0.9.0-bin/ tez-0.9.0
# hdfs dfs -mkdir -p /tez-0.9.0
# cd /tez-0.9.0/
# hdfs dfs -put share/tez.tar.gz /tez-0.9.0
配置tez
# cd /data1/hadoop/hadoop/etc/hadoop/
# cat tez-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>$fs.defaultFS/apps/tez-0.9.0/tez.tar.gz</value>
</property>
<property>
<name>tez.container.max.java.heap.fraction</name>
<value>0.2</value>
</property>
</configuration>
参考:/tez-0.9.0/conf/tez-default-template.xml
环境变量配置(~/.bashrc)
添加如下配置 export TEZ_CONF_DIR=$HADOOP_CONF_DIR export TEZ_JARS=/tez-0.9.0/*:/tez-0.9.0/lib/* export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH 执行"source ~/.bashrc"让环境变量生效。
hadoop版本兼容问题
[[email protected] ~]# cd /tez-0.9.0/lib [[email protected] lib]# rm -rf hadoop-mapreduce-client-core-2.7.0.jar hadoop-mapreduce-client-common-2.7.0.jar [[email protected] lib]# cp /data1/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.1.jar /tez-0.9.0/lib/ [[email protected] lib]# cp /data1/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.9.1.jar /tez-0.9.0/lib/
启动hive
#hive
hive> SET hive.execution.engine=tez; 设置执行引擎为tez,默认是MapReduce
测试数据
创建表
hive> create table user_info(user_id bigint, firstname string, lastname string, count string);
插入数据
hive> insert into user_info values(1,‘dennis‘,‘hu‘,‘CN‘),(2,‘Json‘,‘Lv‘,‘Jpn‘),(3,‘Mike‘,‘Lu‘,‘USA‘);
Query ID = root_20190618043047_bfc41253-60f9-469d-b6a9-c26c93a92e82
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.
Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.55 s
----------------------------------------------------------------------------------------------
Loading data to table default.user_info
OK
Time taken: 9.488 seconds
查询
> select count(1) from user_info;
Query ID = root_20190618043342_5f83efb4-39bf-4d67-bac4-d67205086ae7
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 4.46 s
----------------------------------------------------------------------------------------------
OK
9
Time taken: 4.979 seconds, Fetched: 1 row(s)
hive> select count(1) from user_info;
Query ID = root_20190618043349_ecee5657-7c95-43ab-80e9-101dd36d6fc7
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 0.72 s
----------------------------------------------------------------------------------------------
OK
9
Time taken: 1.156 seconds, Fetched: 1 row(s)
yarn web界面查看
由此可看出,引擎类型变成TEZ。
以上是关于hive on tez的主要内容,如果未能解决你的问题,请参考以下文章
记一发Hive on tez的配置(Hive 3.1.1, Hadoop 3.0.3, Tez 0.9.1)