多个 Hive 连接因执行错误而失败,返回代码 2
Posted
技术标签:
【中文标题】多个 Hive 连接因执行错误而失败,返回代码 2【英文标题】:Multiple Hive joins failing with Execution Error, return code 2 【发布时间】:2015-12-01 15:48:37 【问题描述】:我正在尝试执行一个查询,其中一个表在另外两个表上保持外部连接。查询如下:
SELECT T.Rdate, c.Specialty_Cruises, b.Specialty_Cruises from arunf.PASSENGER_HISTORY_FACT T
LEFT OUTER JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C on (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(c.rdate,'yyyy-MM-dd') AND T.book_num = c.Courtesy_Hold_Booking_Num)
LEFT OUTER JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP b ON (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(b.rdate,'yyyy-MM-dd') AND T.book_num = B.Online_Booking_Number);
此查询失败并显示通知:
: exec.Task (SessionState.java:printError(922)) - /tmp/arunf/hive.log
: mr.MapredLocalTask (MapredLocalTask.java:executeInChildVM(308)) - Execution failed with exit status: 2
: ql.Driver (SessionState.java:printError(922)) - FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
错误日志包含以下内容:
2015-12-01 10:25:16,077 INFO [main]: mr.ExecDriver (SessionState.java:printInfo(913)) - Execution log at: /tmp/arunf/arunf_20151201102525_914a2eab-652b-440c-9fdc-a473b4caa026.log
2015-12-01 10:25:16,278 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(118)) - <PERFLOG method=deserializePlan from=org.apache.hadoop.hive.ql.exec.Utilities>
2015-12-01 10:25:16,278 INFO [main]: exec.Utilities (Utilities.java:deserializePlan(953)) - Deserializing MapredLocalWork via kryo
2015-12-01 10:25:16,421 INFO [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(158)) - </PERFLOG method=deserializePlan start=1448983516278 end=1448983516421 duration=143 from=org.apache.hadoop.hive.ql.exec.Utilities>
2015-12-01 10:25:16,429 INFO [main]: mr.MapredLocalTask (SessionState.java:printInfo(913)) - 2015-12-01 10:25:16 Starting to launch local task to process map join; maximum memory = 1029701632
2015-12-01 10:25:16,498 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for c created
2015-12-01 10:25:16,500 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(441)) - fetchoperator for b created
2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[2]
2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 2 TS initialized
2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 2 TS
2015-12-01 10:25:16,500 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initialize(394)) - Initialization Done 2 TS
2015-12-01 10:25:16,500 INFO [main]: mr.MapredLocalTask (MapredLocalTask.java:initializeOperators(461)) - fetchoperator for b initialized
2015-12-01 10:25:16,500 INFO [main]: exec.TableScanOperator (Operator.java:initialize(346)) - Initializing Self TS[0]
2015-12-01 10:25:16,501 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(419)) - Operator 0 TS initialized
2015-12-01 10:25:16,501 INFO [main]: exec.TableScanOperator (Operator.java:initializeChildren(423)) - Initializing children of 0 TS
2015-12-01 10:25:16,502 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(458)) - Initializing child 1 HASHTABLESINK
2015-12-01 10:25:16,503 INFO [main]: exec.HashTableSinkOperator (Operator.java:initialize(346)) - Initializing Self HASHTABLESINK[1]
2015-12-01 10:25:16,503 INFO [main]: mapjoin.MapJoinMemoryExhaustionHandler (MapJoinMemoryExhaustionHandler.java:<init>(61)) - JVM Max Heap Size: 1029701632
2015-12-01 10:25:16,533 ERROR [main]: mr.MapredLocalTask (MapredLocalTask.java:executeInProcess(357)) - Hive Runtime Error: Map local work failed
java.lang.RuntimeException: cannot find field courtesy_hold_booking_num from [0:rdate, 1:online_booking_number, 2:pages, 3:mobile_device_type, 4:specialty_cruises]
at org.apache.hadoop.hive.serde2.objectinspector.ObjectInspectorUtils.getStandardStructFieldRef(ObjectInspectorUtils.java:410)
at org.apache.hadoop.hive.serde2.BaseStructObjectInspector.getStructFieldRef(BaseStructObjectInspector.java:133)
at org.apache.hadoop.hive.ql.exec.ExprNodeColumnEvaluator.initialize(ExprNodeColumnEvaluator.java:55)
at org.apache.hadoop.hive.ql.exec.JoinUtil.getObjectInspectorsFromEvaluators(JoinUtil.java:68)
at org.apache.hadoop.hive.ql.exec.HashTableSinkOperator.initializeOp(HashTableSinkOperator.java:138)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:469)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:425)
at org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:193)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.initializeOperators(MapredLocalTask.java:460)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.startForward(MapredLocalTask.java:366)
at org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask.executeInProcess(MapredLocalTask.java:346)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.main(ExecDriver.java:743)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
请注意,当主表与表分开时,它们会成功。 例如,以下查询成功:
SELECT T.Rdate from arunf.PASSENGER_HISTORY_FACT T
LEFT OUTER JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C on (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(c.rdate,'yyyy-MM-dd') AND T.book_num = c.Courtesy_Hold_Booking_Num);
SELECT T.Rdate from arunf.PASSENGER_HISTORY_FACT T
LEFT OUTER JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP b ON (unix_timestamp(T.RDATE,'yyyy-MM-dd')=unix_timestamp(b.rdate,'yyyy-MM-dd') AND T.book_num = B.Online_Booking_Number);
我还能够以相同的组合方式将此主表与其他两个表进行左外连接。仅当我尝试将主表与这两个辅助表分开时,我才会遇到此问题。
请提供您对此问题的见解。
【问题讨论】:
示例查询中显示的列与错误消息中列出的列完全不同。要么你试图“隐藏”实际的查询,要么你粘贴了错误的日志,或者你的 Hive Metastore 完全损坏了...... 顺便说一句,有一个“代码示例”格式。不要害羞,使用它。它使代码示例更易于阅读。 @SamsonScharfrichter,感谢您指出这一点。我正在使用虚拟表来解决问题,并且我已经发布了它们,其日志现在已被覆盖。所以我发布了给定日志的实际代码。如果您需要更多信息,请告诉我。我现在将查询引用为代码段,希望对您有所帮助。 【参考方案1】:蜂巢虫来来去去。它可能取决于 Hive 版本 (?) 和表格格式(文本?AVRO?序列?ORC?Parquet?)。
现在,如果每个查询似乎都有效,您为什么不尝试一种解决方法基于分而治之的方法(或者:如果 Hive 不够聪明,无法设计一个执行计划,然后让我们自己设计) 例如
SELECT TC.RDate, TC.Specialty_Cruises, B.Specialty_Cruises
FROM
(SELECT T.Rdate, C.Specialty_Cruises
FROM arunf.PASSENGER_HISTORY_FACT T
LEFT JOIN arunf.RPT_WEB_COURTESY_HOLD_TEMP C
ON unix_timestamp(T.RDate,'yyyy-MM-dd')=unix_timestamp(C.RDate,'yyyy-MM-dd')
AND T.book_num = C.Courtesy_Hold_Booking_Num
) TC
LEFT JOIN arunf.RPT_WEB_BOOKING_NUM_TEMP B
ON unix_timestamp(TC.RDate,'yyyy-MM-dd')=unix_timestamp(B.RDate,'yyyy-MM-dd')
AND TC.book_num = B.Online_Booking_Number
;
【讨论】:
我之前试过了,效果很好。我必须在另一个大蜂巢查询中使用这个查询块,该查询左连接连接 7 个其他表。因此,当我必须在这种情况下使用嵌套连接时,它会变得有点棘手。另一种方法是将这个嵌套连接查询的输出加载到一个临时表中,然后在主查询中使用该临时表。不管怎样,我现在可以继续了。感谢您的投入。 最后一件事:您是否考虑过 Pig 而不是 Hive?可能更适合这个用例(带中间结果的多步骤过程)。不过,我没有足够的猪实践经验来提供明智的建议 我也是这样。我没有足够的实际接触猪。所以我决定在时间宝贵的时候不去未知的水域冒险。我很确定这在 Impala 中也不会成为问题。当我有时间时,我会在 Impala 中做一些研发。以上是关于多个 Hive 连接因执行错误而失败,返回代码 2的主要内容,如果未能解决你的问题,请参考以下文章
连接多个设备时如何使用 ADB Shell?因“错误:多个设备和模拟器”而失败
失败:执行错误,从 org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask 返回代码 1
Hive 错误:失败:执行错误,从 org.apache.hadoop.hive.ql.exec.DDLTask 返回代码 1。字符串类型信息