hive on tez

Posted yjt1993

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hive on tez相关的知识,希望对你有一定的参考价值。

hive运行模式

  1. hive on mapreduce 离线计算(默认)
  2. hive on tez   YARN之上支持DAG作业的计算框架
  3. hive on spark 内存计算

hive on tez

Tez是一个构建于YARN之上的支持复杂的DAG任务的数据处理框架。它由Hontonworks开源,它把mapreduce的过程拆分成若干个子过程,同时可以把多个mapreduce任务组合成一个较大的DAG任务,减少了mapreduce之间的文件存储,同时合理组合其子过程从而大幅提升MapReduce作业的性能。

安装tez

tez的安装有源码安装和二进制包安装,这里使用二进制包安装。

hadoop版本:2.9.1

hive版本:2.1.1

tez版本:0.9.0

前提:hadoop环境已经搭建好,包括yarn(tez需要运行在yarn上)、hive

下载

wget http://mirror.bit.edu.cn/apache/tez/0.9.0/apache-tez-0.9.0-bin.tar.gz

安装

# tar zxvf apache-tez-0.9.0-bin.tar.gz
# mv apache-tez-0.9.0-bin/ tez-0.9.0
# hdfs dfs -mkdir -p /tez-0.9.0
# cd /tez-0.9.0/
# hdfs dfs -put share/tez.tar.gz /tez-0.9.0

配置tez

# cd /data1/hadoop/hadoop/etc/hadoop/
# cat tez-site.xml

<?xml version="1.0" encoding="UTF-8"?>

<configuration>

  <property>

    <name>tez.lib.uris</name>

    <value>$fs.defaultFS/apps/tez-0.9.0/tez.tar.gz</value>

  </property>

  <property>

    <name>tez.container.max.java.heap.fraction</name>

    <value>0.2</value>

  </property>

</configuration>

参考:/tez-0.9.0/conf/tez-default-template.xml

环境变量配置(~/.bashrc)

添加如下配置
export TEZ_CONF_DIR=$HADOOP_CONF_DIR

export TEZ_JARS=/tez-0.9.0/*:/tez-0.9.0/lib/*

export HADOOP_CLASSPATH=$TEZ_CONF_DIR:$TEZ_JARS:$HADOOP_CLASSPATH

执行"source ~/.bashrc"让环境变量生效。

hadoop版本兼容问题

[[email protected] ~]# cd /tez-0.9.0/lib

[[email protected] lib]# rm -rf hadoop-mapreduce-client-core-2.7.0.jar hadoop-mapreduce-client-common-2.7.0.jar

 

[[email protected] lib]# cp /data1/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.9.1.jar /tez-0.9.0/lib/

[[email protected] lib]# cp /data1/hadoop/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.9.1.jar /tez-0.9.0/lib/

启动hive

#hive
hive> SET hive.execution.engine=tez; 设置执行引擎为tez,默认是MapReduce

测试数据

创建表
hive> create table user_info(user_id bigint, firstname string, lastname string, count string);
插入数据
hive> insert into user_info values(1,‘dennis‘,‘hu‘,‘CN‘),(2,‘Json‘,‘Lv‘,‘Jpn‘),(3,‘Mike‘,‘Lu‘,‘USA‘);

Query ID = root_20190618043047_bfc41253-60f9-469d-b6a9-c26c93a92e82
Total jobs = 1
Launching Job 1 out of 1
Tez session was closed. Reopening...
Session re-established.


Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 01/01 [==========================>>] 100% ELAPSED TIME: 4.55 s
----------------------------------------------------------------------------------------------
Loading data to table default.user_info
OK
Time taken: 9.488 seconds

查询

> select count(1) from user_info;
Query ID = root_20190618043342_5f83efb4-39bf-4d67-bac4-d67205086ae7
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 4.46 s
----------------------------------------------------------------------------------------------
OK
9
Time taken: 4.979 seconds, Fetched: 1 row(s)
hive> select count(1) from user_info;
Query ID = root_20190618043349_ecee5657-7c95-43ab-80e9-101dd36d6fc7
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1560826244680_0015)

----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 0.72 s
----------------------------------------------------------------------------------------------
OK
9
Time taken: 1.156 seconds, Fetched: 1 row(s)

yarn web界面查看

技术图片

 

 由此可看出,引擎类型变成TEZ。

以上是关于hive on tez的主要内容,如果未能解决你的问题,请参考以下文章

hive on tez配置

记一发Hive on tez的配置(Hive 3.1.1, Hadoop 3.0.3, Tez 0.9.1)

hive on tez 错误记录

hive on spark VS SparkSQL VS hive on tez

hive on tez

hive on spark VS SparkSQL VS hive on tez