Hive执行脚本: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive执行脚本: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask相关的知识,希望对你有一定的参考价值。

Hive执行脚本: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask


0. 写在前面

  • Hadoop:Hadoop2.9.2
  • Hive:Hive2.3.7

1. 实验场景

离线数仓之留存会员

1. 留存会员与留存率说明

某段时间的新增会员,经过一段时间后,仍继续使用应用认为是​​留存会员​

这部分会员占当时新增会员的比例为​​留存率​

2. 需求:1日、2日、3日的会员留存数和会员留存率

30

31

1

2

----

10w新会员

3w

1日会员留存数

20w

5w

2日会员留存数

30w

4w

3日会员留存数

​10W新会员​​:dws_member_add_day(dt=08-01)明细数据

​3W​​:特点 --> 在1号是新会员,在2日启动了(2日的启动日志) dws_member_start_day

3. 脚本

  • 原始脚本
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
insert overwrite table dws.dws_member_retention_day
partition(dt=$do_date)
(
select
t2.device_id,
t2.uid,
t2.app_v,
t2.os_type,
t2.language,
t2.channel,
t2.area,
t2.brand,
t2.dt add_date,
1
from dws.dws_member_start_day t1 join dws.dws_member_add_day
t2 on t1.device_id=t2.device_id
where t2.dt=date_add($do_date, -1)
and t1.dt=$do_date
union all
select
t2.device_id,
t2.uid,
t2.app_v,
t2.os_type,
t2.language,
t2.channel,
t2.area,
t2.brand,
t2.dt add_date,
2
from dws.dws_member_start_day t1 join dws.dws_member_add_day
t2 on t1.device_id=t2.device_id
where t2.dt=date_add($do_date, -2)
and t1.dt=$do_date
union all
select
t2.device_id,
t2.uid,
t2.app_v,
t2.os_type,
t2.language,
t2.channel,
t2.area,
t2.brand,
t2.dt add_date,
3
from dws.dws_member_start_day t1 join dws.dws_member_add_day
t2 on t1.device_id=t2.device_id
where t2.dt=date_add($do_date, -3)
and t1.dt=$do_date
);
"
hive -e "$sql"

2. 报错信息

  • 报错信息

Hive执行脚本:

Return Code XXX 一般是​​内部错误​

3. 解决方法

查看日志信息

前置芝士

Hive中的日志分为​​系统日志​​和​​Job 日志​​两种

  • ​系统日志​​记录了hive的运行情况,错误状况。
  • ​Job 日志​​记录了Hive 中job的执行的历史过程。
  • 系统日志存储位置: $HIVE_HOME/conf/hive-log4j.properties文件
  • Job日志存储位置: hive.querylog.location参数值
  • ​hive.log​​日志文件「该日志文件内容是比较简略的」

hive.log 在​​缺省情况​​下 存储位置是:/tmp/「当前用户名字」/hive.log

  • ​缺省情况​​下该日志文件目录查找方法: 可以在Hive安装目录下的 conf/hive-log4j.properties中查看

​hive-log4j.properties​​内容如下:

# Define some default values that can be overridden by system properties
hive.root.logger=WARN,DRFA
hive.log.dir=/tmp/$user.name
hive.log.file=hive.log

# Define the root logger to the system property "hadoop.root.logger".
log4j.rootLogger=$hive.root.logger, EventCounter

# Logging Threshold
log4j.threshhold=WARN

#
# Daily Rolling File Appender
#

log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
log4j.appender.DRFA.File=$hive.log.dir/$hive.log.file

# Rollver at midnight
log4j.appender.DRFA.DatePattern=.yyyy-MM-dd

# 30-day backup
#log4j.appender.DRFA.MaxBackupIndex=30
log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout

# Pattern format: Date LogLevel LoggerName LogMessage
#log4j.appender.DRFA.layout.ConversionPattern=%dISO8601 %p %c: %m%n
# Debugging Pattern format
log4j.appender.DRFA.layout.ConversionPattern=%dISO8601 %-5p %c2 (%F:%M(%L)) - %m%n


#
# console
# Add "console" to rootlogger above if you want to use this
#

log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%dyy/MM/dd HH:mm:ss %p %c2: %m%n

#custom logging levels
#log4j.logger.xxx=DEBUG

#
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
#
log4j.appender.EventCounter=org.apache.hadoop.metrics.jvm.EventCounter


log4j.category.DataNucleus=ERROR,DRFA
log4j.category.Datastore=ERROR,DRFA
log4j.category.Datastore.Schema=ERROR,DRFA
log4j.category.JPOX.Datastore=ERROR,DRFA
log4j.category.JPOX.Plugin=ERROR,DRFA
log4j.category.JPOX.MetaData=ERROR,DRFA
log4j.category.JPOX.Query=ERROR,DRFA
log4j.category.JPOX.General=ERROR,DRFA
log4j.category.JPOX.Enhancer=ERROR,DRFA

在最前面可以看到以下信息:

# 默认
hive.root.logger=WARN,DRFA
hive.log.dir=/tmp/$user.name
hive.log.file=hive.log

如果不是​​default​​情况下,需要到Hive安装目录下的conf目录下查看​​hive-site.xml​​中的​​hive.querylog.location​​参数指定的位置

  • ​hive.querylog.location​​参数的官网说明

Hive执行脚本:

  • 查看MR的日志「详细」

需要提前开启​​历史服务器​​+ 打开​​日志聚合​​ + ​​SQL运行在集群模式​

查看MR日志,发现日志的时间点对不上,需要查看是不是打开了​​本地模式「local」​

  • 查看$HIVE_HOME/conf/hive-site.xml
<!-- 操作小规模数据时,使用本地模式提高效率 -->
<property>
<name>hive.exec.mode.local.auto</name>
<value>true</value>
</property>

确认是启用了local模式

直接在脚本中新增 ​​set hive.exec.mode.local.auto = false;​​ 即可

  • 重新执行脚本再查看MR日志

此时日志可以查看到,​​对于此项目场景​​,错误解决方法如下:

  • 改动SQL

使用​​中间表​​进行过渡

不要直接将最终的查询结果插入目标表,而是先插入到中间表,再从中间表查询全部数据插入到目标表中

  • 最终脚本
#!/bin/bash
source /etc/profile
if [ -n "$1" ] ;then
do_date=$1
else
do_date=`date -d "-1 day" +%F`
fi
sql="
drop table if exists tmp.tmp_member_retention;
create table tmp.tmp_member_retention as
(
select t2.device_id,
t2.uid,
t2.app_v,
t2.os_type,
t2.language,
t2.channel,
t2.area,
t2.brand,
t2.dt add_date,
1
from dws.dws_member_start_day t1 join dws.dws_member_add_day
t2 on t1.device_id=t2.device_id
where t2.dt=date_add($do_date, -1)
and t1.dt=$do_date
union all
select t2.device_id,
t2.uid,
t2.app_v,
t2.os_type,
t2.language,
t2.channel,
t2.area,
t2.brand,
t2.dt add_date,
2
from dws.dws_member_start_day t1 join dws.dws_member_add_day
t2 on t1.device_id=t2.device_id
where t2.dt=date_add($do_date, -2)
and t1.dt=$do_date
union all
select t2.device_id,
t2.uid,
t2.app_v,
t2.os_type,
t2.language,
t2.channel,
t2.area,
t2.brand,
t2.dt add_date,
3
from dws.dws_member_start_day t1 join dws.dws_member_add_day
t2 on t1.device_id=t2.device_id
where t2.dt=date_add($do_date, -3)
and t1.dt=$do_date
);
insert overwrite table dws.dws_member_retention_day
partition(dt=$do_date)
select * from tmp.tmp_member_retention;
"
hive -e "$sql"

问题解决

4. 一个有趣的发现

查找资料过程中发现 ​​StackOverFlow​​ 有关于这样的回答

I recently faced the same issue/error in my cluster. The JOB would always get to some 80%+ reduction and fail with the same error, with nothing to go on in the execution logs either. Upon multiple iterations and research I found that among the plethora of files getting loaded some were non-compliant with the structure provided for the base table(table being used to insert data into partitioned table). Point to be noted here is whenever I executed a select query for a particular value in the partitioning column or created a static partition it worked fine as in that case error records were being skipped.

  • 翻译如下

我最近在集群中遇到了同样的问题/错误。 JOB 总是会减少 80% 以上,并因同样的错误而失败,执行日志中也没有任何内容。经过多次迭代和研究,我发现在加载的大量文件中,有些文件不符合为基表提供的结构(用于将数据插入分区表的表)。 这里要注意的一点是,每当我对分区列中的特定值执行选择查询或创建静态分区时,它都可以正常工作,因为在这种情况下会跳过错误记录。

其中有一句:

​在加载的大量文件中,有些文件不符合为基表提供的结构(用于将数据插入分区表的表)​

这句话值得我们关注

5. 参考

https://stackoverflow.com/questions/11185528/what-is-hive-return-code-2-from-org-apache-hadoop-hive-ql-exec-mapredtask
https://www.bilibili.com/video/BV1TB4y117vj?p=40&spm_id_from=pageDriver&vd_source=2b8af863001fac2c89aab4db5ba5b9db

Hive问题:Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

hive执行过程中报错,抓重点(黄色):

2019-02-01 09:56:54,623 ERROR [pool-7-thread-4] dao.IHiveDaoImpl - java.sql.SQLException: org.apache.hive.service.cli.HiveSQLException: 
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:380) at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:257) at org.apache.hive.service.cli.operation.SQLOperation.access$800(SQLOperation.java:91) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:348) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758) at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:362) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

大概是执行mapreduce的时候的错误:

查看了下mapreduce确实是执行了,

拉取mr错误日志:

2019-02-01 10:28:35,832 INFO [IPC Server handler 4 on 38091] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1537175606568_162793_m_000000_3: Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"log":"5u001aNEWEHIREWEB17.2019012911u001a1u001a3u001a1548730807629u001a43u001a14u001a2223123u001a2577551u001a8e56221be35a44f8845064b8cc8f21f9u001a61.170.197.152u001a","webname":"ehireLog","mon":"201901","dt":"20190129"}
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:169)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1758)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"log":"5u001aNEWEHIREWEB17.2019012911u001a1u001a3u001a1548730807629u001a43u001a14u001a2223123u001a2577551u001a8e56221be35a44f8845064b8cc8f21f9u001a61.170.197.152u001a","webname":"ehireLog","mon":"201901","dt":"20190129"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
    at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:160)
    ... 8 more
Caused by: com.tracker.common.db.simplehbase.exception.SimpleHBaseException: convert result exception. cells=[003x111/data:id/1538028988105/Put/vlen=4/seqid=0, 003x111/data:isInSelector/1538028988105/Put/vlen=4/seqid=0, 003x111/data:isStats/1538028988105/Put/vlen=4/seqid=0, 003x111/data:pageDesc/1538028988105/Put/vlen=6/seqid=0, 003x111/data:pageType/1548918298621/Put/vlen=1/seqid=0, 003x111/data:webId/1538028988105/Put/vlen=4/seqid=0] type=class com.tracker.common.data.model.dict.website.Page
    at com.tracker.common.db.simplehbase.HbaseClient.convertToHbaseObjectResult(HbaseClient.java:337)
    at com.tracker.common.db.simplehbase.HbaseClientImpl$6.handleData(HbaseClientImpl.java:177)
    at com.tracker.common.db.simplehbase.HbaseClientImpl.handData_internal(HbaseClientImpl.java:733)
    at com.tracker.common.db.simplehbase.HbaseClientImpl.handDataByRowPrefixList(HbaseClientImpl.java:651)
    at com.tracker.common.db.simplehbase.HbaseClientImpl.findObjectByRowPrefixList(HbaseClientImpl.java:174)
    at com.tracker.common.db.simplehbase.HbaseClientImpl.findObjectByRowPrefix(HbaseClientImpl.java:167)
    at com.tracker.common.data.dao.dict.WebDictDataDao$6.apply(WebDictDataDao.java:154)
    at com.tracker.common.data.dao.dict.WebDictDataDao$6.apply(WebDictDataDao.java:151)
    at com.tracker.common.cache.LocalMapCache.getOrElse(LocalMapCache.java:66)
    at com.tracker.common.data.dao.dict.WebDictDataDao.getPageList(WebDictDataDao.java:151)
    at com.tracker.common.data.dao.dict.WebDictDataDao.loadDictToCache(WebDictDataDao.java:36)
    at com.tracker.common.data.query.DictDataQuery.loadLogPaserDict(DictDataQuery.java:84)
    at com.tracker.hive.func.udf.parse.ParseLog.initialize(ParseLog.java:64)
    at org.apache.hadoop.hive.ql.udf.generic.GenericUDF.initializeAndFoldConstants(GenericUDF.java:141)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:146)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:140)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:140)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:140)
    at org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.initialize(ExprNodeGenericFuncEvaluator.java:140)
    at org.apache.hadoop.hive.ql.exec.ExprNodeEvaluatorHead.initialize(ExprNodeEvaluatorHead.java:39)
    at org.apache.hadoop.hive.ql.exec.FilterOperator.process(FilterOperator.java:80)
    at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:897)
    at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
    at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:148)
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:547)
    ... 9 more
Caused by: com.tracker.common.db.simplehbase.exception.SimpleHBaseException: java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 1
    at com.tracker.common.db.simplehbase.HbaseClient.convertBytesToPOJOField(HbaseClient.java:374)
    at com.tracker.common.db.simplehbase.HbaseClient.convertToHbaseObjectResult(HbaseClient.java:332)
    ... 33 more
Caused by: java.lang.IllegalArgumentException: offset (0) + length (4) exceed the capacity of the array: 1
    at org.apache.hadoop.hbase.util.Bytes.explainWrongLengthOrOffset(Bytes.java:632)
    at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:802)
    at org.apache.hadoop.hbase.util.Bytes.toInt(Bytes.java:778)
    at com.tracker.coprocessor.utils.TypeHandlerHolder$IntegerHandler.toObject(TypeHandlerHolder.java:311)
    at com.tracker.common.db.simplehbase.HbaseClient.convertBytesToPOJOField(HbaseClient.java:371)
    ... 34 more

看下黄色部分,可知是hbase的对应实体类错误。

 

原因:是修改了hbase数据字典表中的类型 ->  没有更新hive的jar包。

 


以上是关于Hive执行脚本: Return Code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask的主要内容,如果未能解决你的问题,请参考以下文章

集群模式下执行HQL提示`Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask`

Hive问题:Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask

Error, return code 1 from org.apache.hadoop.hive.

USDP使用笔记设置Hive on Tez解决return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask问题

USDP使用笔记设置Hive on Tez解决return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask问题

hive查询或者插入数据报return code 2的错误