hive on tez配置
Posted ChavinKing
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了hive on tez配置相关的知识,希望对你有一定的参考价值。
1、Tez简介
Tez是Hontonworks开源的支持DAG作业的计算框架,它可以将多个有依赖的作业转换为一个作业从而大幅提升MapReduce作业的性能。Tez并不直接面向最终用户——事实上它允许开发者为最终用户构建性能更快、扩展性更好的应用程序
2、编译tez
本文记录Tez 0.8.5的编译过程,之前的Tez版本都是源码包,最新的版本虽然提供了编译后的tar包,但是大部分情况下是针对特定的Hadoop版本,如果和我们的Hadoop版本不一致,可能某个时刻会出现一些未知的问题,所以为了稳定,还是建议和自己使用的Hadoop版本匹配,所以就需要编译了。
(1)解压完毕,修改根目录下的pom.xml,修改对应的Hadoop的版本。
(2)注释掉tez-ui2的子项目依赖pom,因为tez ui2编译坑比较多,可能通不过
<modules>
<module>hadoop-shim</module>
<module>tez-api</module>
<module>tez-common</module>
<module>tez-runtime-library</module>
<module>tez-runtime-internals</module>
<module>tez-mapreduce</module>
<module>tez-examples</module>
<module>tez-tests</module>
<module>tez-dag</module>
<module>tez-ext-service-tests</module>
<!--
<module>tez-ui</module>
<module>tez-ui2</module>
-->
<module>tez-plugins</module>
<module>tez-tools</module>
<module>hadoop-shim-impls</module>
<module>tez-dist</module>
<module>docs</module>
</modules>
(3)如果你是root用户编译Tez,记得修改tez-ui/pom.xml,添加允许root权限执行nodejs安装bower
<execution>
<id>Bower install</id>
<phase>generate-sources</phase>
<goals>
<goal>exec</goal>
</goals>
<configuration>
<workingDirectory>${webappDir}</workingDirectory>
<executable>${node.executable}</executable>
<arguments>
<argument>node_modules/bower/bin/bower</argument>
<argument>install</argument>
<argument>--allow-root</argument>
<argument>--remove-unnecessary-resolutions=false</argument>
</arguments>
</configuration>
</execution>
(4)注意编译的linux机器最好能fan qiang下载东西,如果不能就把根目录下的pom.xml中tez-ui也注释掉,因为不管是tez-ui还是tez-ui2都需要下载nodejs相关的东西,默认的是在墙外的,不能fan出去80%的几率会编译失败,所以如果是nodejs相关的编译失败,就把tez-ui相关的子项目都注释掉不让参与编译,这个ui没什么大的作用,就是看下job的计划,没有它也能使用Tez优化DAG依赖。
(5)能不能自己在linux上单独装nodejs,然后让tez的nodejs用本机装的那个而避免下载墙外的,经实测发现不行,tez里面的nodejs好像是单独依赖的,只要编译就会下载,最好的办法就是注释掉和tez-ui相关的东西
上面的一切搞定后,开始执行编译命令:
mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true
编译成功后,出现下图:
[INFO] Building jar: /mnt/apache-tez-0.8.5-src/docs/target/tez-docs-0.8.5-tests.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] tez ................................................ SUCCESS [01:57 min]
[INFO] hadoop-shim ........................................ SUCCESS [01:03 min]
[INFO] tez-api ............................................ SUCCESS [01:33 min]
[INFO] tez-common ......................................... SUCCESS [ 4.987 s]
[INFO] tez-runtime-internals .............................. SUCCESS [ 7.396 s]
[INFO] tez-runtime-library ................................ SUCCESS [ 27.988 s]
[INFO] tez-mapreduce ...................................... SUCCESS [ 7.937 s]
[INFO] tez-examples ....................................... SUCCESS [ 1.829 s]
[INFO] tez-dag ............................................ SUCCESS [ 34.257 s]
[INFO] tez-tests .......................................... SUCCESS [ 20.367 s]
[INFO] tez-ext-service-tests .............................. SUCCESS [ 4.663 s]
[INFO] tez-plugins ........................................ SUCCESS [ 0.126 s]
[INFO] tez-yarn-timeline-history .......................... SUCCESS [ 2.838 s]
[INFO] tez-yarn-timeline-history-with-acls ................ SUCCESS [ 1.692 s]
[INFO] tez-history-parser ................................. SUCCESS [01:31 min]
[INFO] tez-tools .......................................... SUCCESS [ 0.169 s]
[INFO] tez-perf-analyzer .................................. SUCCESS [ 0.090 s]
[INFO] tez-job-analyzer ................................... SUCCESS [01:19 min]
[INFO] tez-javadoc-tools .................................. SUCCESS [ 0.632 s]
[INFO] hadoop-shim-impls .................................. SUCCESS [ 0.203 s]
[INFO] hadoop-shim-2.6 .................................... SUCCESS [ 0.688 s]
[INFO] tez-dist ........................................... SUCCESS [01:58 min]
[INFO] Tez ................................................ SUCCESS [ 0.141 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 11:27 min
[INFO] Finished at: 2017-10-29T21:01:55+08:00
[INFO] Final Memory: 73M/262M
[INFO] ------------------------------------------------------------------------
编译成功后的文件在tez-dist/target目录下:
cd /mnt/apache-tez-0.8.5-src/tez-dist/target
$ ls
archive-tmp
maven-archiver
tez-0.8.5
tez-0.8.5-minimal
tez-0.8.5-minimal.tar.gz
tez-0.8.5.tar.gz
tez-dist-0.8.5-tests.jar
3、配置hive on tez
将tez-0.8.5下所有jar包cp到hive lib/目录下,将tez-0.8.5.tar.gz上传到hdfs一个目录下:
$ /opt/cdh5/hadoop-2.6.0-cdh5.10.0/bin/hdfs dfs -mkdir -p /user/hadoop
$ /opt/cdh5/hadoop-2.6.0-cdh5.10.0/bin/hdfs dfs -put /home/hadoop/tez-0.8.5.tar.gz /user/hadoop
编辑tez配置文件etc/hadoop/tez-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>tez.lib.uris</name>
<value>/user/hadoop/tez-0.8.5.tar.gz</value>
</property>
</configuration>
重启hadoop集群。
set hive.execution.engine=tez;
select count(*) from t1;
注意点:
本次测试我安装了cdh5.10.0的hive,部署上述tez包,运行程序报错,具体错误见下文。
将tez部署到hive 2.1.0上运行成功,结果如下:
--测试结果:
hive (default)> set hive.execution.engine=tez;
hive (default)> select count(*) from t1;
17/11/05 21:14:54 INFO Configuration.deprecation: mapred.input.dir.recursive is deprecated. Instead, use mapreduce.input.fileinputformat.input.dir.recursive
17/11/05 21:14:54 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
Query ID = hadoop_20171105211451_0c1df9ef-c3d2-4ec9-b52b-cd5770d7b5b7
Total jobs = 1
Launching Job 1 out of 1
17/11/05 21:14:54 INFO Configuration.deprecation: mapred.committer.job.setup.cleanup.needed is deprecated. Instead, use mapreduce.job.committer.setup.cleanup.needed
17/11/05 21:14:56 INFO client.RMProxy: Connecting to ResourceManager at db01/192.168.0.181:8032
17/11/05 21:14:56 INFO impl.YarnClientImpl: Submitted application application_1509317142960_0011
17/11/05 21:15:02 INFO client.RMProxy: Connecting to ResourceManager at db01/192.168.0.181:8032
Status: Running (Executing on YARN cluster with App id application_1509317142960_0011)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 7.02 s
----------------------------------------------------------------------------------------------
OK
c0
17/11/05 21:15:10 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
17/11/05 21:15:10 INFO mapred.FileInputFormat: Total input paths to process : 1
3
Time taken: 18.919 seconds, Fetched: 1 row(s)
hive (default)>
4、常见问题:
1)问题如下:
hive (default)> set hive.execution.engine=tez;
hive (default)> select * from t1 order by aa desc;
Query ID = hadoop_20171030053838_a83cb5bd-102f-4362-90b0-3fe3bcda9aa1
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1509312456681_0004)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED -1 0 0 -1 0 0
Reducer 2 KILLED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.24 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1509312456681_0004_1_00, diagnostics=[Vertex vertex_1509312456681_0004_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1509312456681_0004_1_00 [Map 1], java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/MRVersion
at org.apache.hadoop.hive.shims.Hadoop23Shims.isMR2(Hadoop23Shims.java:892)
at org.apache.hadoop.hive.shims.Hadoop23Shims.getHadoopConfNames(Hadoop23Shims.java:963)
at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:362)
at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:377)
at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:302)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:107)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.MRVersion
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 17 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1509312456681_0004_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1509312456681_0004_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
解决办法:
cp /opt/cdh5/hadoop-2.6.0-cdh5.10.0/share/hadoop/mapreduce1/hadoop-core-2.6.0-mr1-cdh5.10.0.jar /opt/cdh5/hive-1.1.0-cdh5.10.0/lib/
2)问题如下:
hive (default)> set hive.execution.engine=tez;
hive (default)> select * from t1 order by aa desc;
Query ID = hadoop_20171030054343_2707c5bd-650e-4b71-89ae-cc094beafb39
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1509312456681_0005)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 FAILED -1 0 0 -1 0 0
Reducer 2 KILLED 1 0 0 1 0 0
--------------------------------------------------------------------------------
VERTICES: 00/02 [>>--------------------------] 0% ELAPSED TIME: 0.24 s
--------------------------------------------------------------------------------
Status: Failed
Vertex failed, vertexName=Map 1, vertexId=vertex_1509312456681_0005_1_00, diagnostics=[Vertex vertex_1509312456681_0005_1_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: t1 initializer failed, vertex=vertex_1509312456681_0005_1_00 [Map 1], java.lang.NoClassDefFoundError: com/esotericsoftware/kryo/Serializer
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:107)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.esotericsoftware.kryo.Serializer
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 12 more
]
Vertex killed, vertexName=Reducer 2, vertexId=vertex_1509312456681_0005_1_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1509312456681_0005_1_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask
解决办法:暂时没找到根本原因。
以上是关于hive on tez配置的主要内容,如果未能解决你的问题,请参考以下文章