USDP使用笔记设置Hive on Tez解决return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask问题
Posted 虎鲸不是鱼
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了USDP使用笔记设置Hive on Tez解决return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask问题相关的知识,希望对你有一定的参考价值。
前言
使用Hive的命令行或者beeline时,经常有测试HQL语法或者逻辑计算公式是否正确的需求,使用Load灌数据到Hive太过重量级了,轻度使用的场景下难免会用到insert操作。
然鹅Hive执行insert语句会跑Map Reduce,FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask 这玩意儿都见怪不怪了。
这种情况就需要修改Hive的计算引擎为Tez,毕竟官方都在Hive的命令行中写的清清楚楚明明白白:
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
官方这句话的意思是:以后的Hive可能不能用Map Reduce,如果一定要跑Map Reduce就使用老版本,新版本最好换成Tez或者Spark。由于USDP套件是包含spark-sql的【这一点比总喜欢推销Impala,故意阉割spark-sql的Cloudera还是要良心一些】,犯不着到Hive里跑spark sql,故笔者设置Tez,性能也够用,异构的好处是万一哪天组件挂了,也不至于完全没得用。
设置Hive on Tez
由于USDP已经放好了安装包:
[root@zhiyong2 ~]# hadoop fs -ls hdfs://zhiyong-1/
Found 7 items
-rw-r--r-- 3 root supergroup 14444 2022-03-11 22:37 hdfs://zhiyong-1/a1
drwxrwxrwx - hadoop supergroup 0 2022-03-03 00:35 hdfs://zhiyong-1/hbase
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:08 hdfs://zhiyong-1/tez
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:08 hdfs://zhiyong-1/tez-0.10.0
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:09 hdfs://zhiyong-1/tmp
drwxrwxrwx - hadoop supergroup 0 2022-03-02 23:51 hdfs://zhiyong-1/user
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:12 hdfs://zhiyong-1/zhiyong-1
[root@zhiyong2 ~]# hadoop fs -ls hdfs://zhiyong-2/
Found 7 items
drwxr-xr-x - root supergroup 0 2022-03-11 23:04 hdfs://zhiyong-2/a1
drwxrwxrwx - hadoop supergroup 0 2022-03-02 22:39 hdfs://zhiyong-2/hbase
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:34 hdfs://zhiyong-2/tez
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:35 hdfs://zhiyong-2/tez-0.10.0
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:35 hdfs://zhiyong-2/tmp
drwxrwxrwx - hadoop supergroup 0 2022-03-11 22:28 hdfs://zhiyong-2/user
drwxrwxrwx - hadoop supergroup 0 2022-03-01 23:38 hdfs://zhiyong-2/zhiyong-2
只管用就行。
[root@zhiyong2 /]# hive
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/usdp-srv/srv/udp/2.0.0.0/hive/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/usdp-srv/srv/udp/2.0.0.0/yarn/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = f45baaf5-6c88-4f0a-b92d-5cf5a8aaf38d
Logging initialized using configuration in file:/opt/usdp-srv/srv/udp/2.0.0.0/hive/conf/hive-log4j2.properties Async: true
Hive Session ID = 02e81116-c0ee-49b5-8e30-0180570ae9ae
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (default)> set hive.execution.engine;
hive.execution.engine=mr
hive (default)> set hive.execution.engine=tez;
hive (default)> set hive.execution.engine;
hive.execution.engine=tez
hive (default)>
设置后才能使用insert:
hive (default)> use db_lzy;
2022-03-02 23:50:47 INFO impl.TimelineClientImpl: Timeline service address: zhiyong3:8188
OK
Time taken: 1.626 seconds
hive (db_lzy)> show tables;
OK
tab_name
2022-03-02 23:50:53 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library from the embedded binaries
2022-03-02 23:50:53 INFO lzo.LzoCodec: Successfully loaded & initialized native-lzo library [hadoop-lzo rev 52decc77982b58949890770d22720a91adce0c3f]
demo1
Time taken: 0.409 seconds, Fetched: 1 row(s)
hive (db_lzy)> select * from demo1;
OK
demo1.num demo1.name
1 zhiyong1
2 zhiyong2
Time taken: 2.909 seconds, Fetched: 2 row(s)
hive (db_lzy)> insert into demo1 values(3,'zhiyong3');
Query ID = root_20220302235130_1f69cf83-f863-4c44-bb72-c70ff6737328
Total jobs = 1
Launching Job 1 out of 1
2022-03-02 23:51:34 INFO client.AHSProxy: Connecting to Application History server at zhiyong3/192.168.88.102:10201
2022-03-02 23:51:34 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Status: Running (Executing on YARN cluster with App id application_1646235761072_0002)
----------------------------------------------------------------------------------------------
VERTICES MODE STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container SUCCEEDED 1 1 0 0 0 0
Reducer 2 ...... container SUCCEEDED 1 1 0 0 0 0
----------------------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 9.71 s
----------------------------------------------------------------------------------------------
Status: DAG finished successfully in 9.71 seconds
Query Execution Summary
----------------------------------------------------------------------------------------------
OPERATION DURATION
----------------------------------------------------------------------------------------------
Compile Query 2.12s
Prepare Plan 12.64s
Get Query Coordinator (AM) 0.04s
Submit Plan 0.51s
Start DAG 0.11s
Run DAG 9.71s
----------------------------------------------------------------------------------------------
Task Execution Summary
----------------------------------------------------------------------------------------------
VERTICES DURATION(ms) CPU_TIME(ms) GC_TIME(ms) INPUT_RECORDS OUTPUT_RECORDS
----------------------------------------------------------------------------------------------
Map 1 3114.00 5,320 224 3 1
Reducer 2 437.00 1,490 0 1 0
----------------------------------------------------------------------------------------------
Loading data to table db_lzy.demo1
OK
col1 col2
Time taken: 28.868 seconds
hive (db_lzy)> select * from demo1;
OK
demo1.num demo1.name
3 zhiyong3
1 zhiyong1
2 zhiyong2
Time taken: 0.275 seconds, Fetched: 3 row(s)
hive (db_lzy)>
默认的mr跑insert会报错:
hive (db_lzy)> set hive.execution.engine=mz;
Query returned non-zero code: 1, cause: 'SET hive.execution.engine=mz' FAILED in validation : Invalid value.. expects one of [mr, tez, spark].
hive (db_lzy)> set hive.execution.engine=mr;
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive (db_lzy)> set hive.execution.engine;
hive.execution.engine=mr
hive (db_lzy)> insert into demo1 values(4,'zhiyong4');
Query ID = root_20220302235327_c5a77037-124f-4988-8413-151ffb5e2703
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
2022-03-02 23:53:28 INFO client.AHSProxy: Connecting to Application History server at zhiyong3/192.168.88.102:10201
2022-03-02 23:53:28 INFO client.AHSProxy: Connecting to Application History server at zhiyong3/192.168.88.102:10201
2022-03-02 23:53:28 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
Starting Job = job_1646235761072_0003, Tracking URL = http://zhiyong3:8088/proxy/application_1646235761072_0003/
Kill Command = /opt/usdp-srv/srv/udp/2.0.0.0/yarn/bin/mapred job -kill job_1646235761072_0003
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2022-03-02 23:53:41,791 Stage-1 map = 0%, reduce = 0%
Ended Job = job_1646235761072_0003 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Stage-Stage-1: HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
hive (db_lzy)> select * from demo1;
OK
demo1.num demo1.name
3 zhiyong3
1 zhiyong1
2 zhiyong2
Time taken: 0.412 seconds, Fetched: 3 row(s)
hive (db_lzy)> set hive.execution.engine=tez;
hive (db_lzy)> exit;
其实不只是USDP的Hive跑Map Reduce时会报这种很令人头疼的return code 2报错,嘴强王者ali yun的DataPhin照样经常遇到这种报错。。。
只想说,珍爱生命,远离Hive on Map Reduce。。。不过Map Reduce还是要仔细研习的,各种开源组件的底层源码有不少就是扒的Map Reduce,比如那个总是喜欢在HDFS写个_success,然后文件块命名方式简单粗暴【part-00000、part-00001、part-00002】的Spark。。。
以上是关于USDP使用笔记设置Hive on Tez解决return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask问题的主要内容,如果未能解决你的问题,请参考以下文章