java.lang.OutOfMemoryError:执行 Hive 查询时出现 Java 堆空间错误
Posted
技术标签:
【中文标题】java.lang.OutOfMemoryError:执行 Hive 查询时出现 Java 堆空间错误【英文标题】:java.lang.OutOfMemoryError: Java heap space error while executing Hive query 【发布时间】:2021-05-28 04:33:22 【问题描述】:在使用 TEZ 执行引擎从 Hive Shell 运行 Hive 查询时,我在日志中收到 java.lang.OutOfMemoryError: Java heap space error,但查询最终完成。
我想了解为什么我会在日志中收到此错误,此查询过去可以正常工作。
有没有人有任何线索或文件可以帮助我理解这个问题。我用谷歌搜索过,但没有太大帮助。
提前感谢您的帮助!!!
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 3, vertexId=vertex_1622153507491_0145_1_02, diagnostics=[Task failed, taskId=task_1622153507491_0145_1_02_000006, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : java.lang.RuntimeException: Map operator initialization failed
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:361)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:266)
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250)
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Async Initialization failed. abortRequested=false
at org.apache.hadoop.hive.ql.exec.Operator.completeInitialization(Operator.java:465)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:399)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:572)
at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:524)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:385)
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(MapRecordProcessor.java:342)
... 17 more
Caused by: java.lang.OutOfMemoryError: Java heap space
at org.apache.hadoop.hive.serde2.WriteBuffers.nextBufferToWrite(WriteBuffers.java:261)
at org.apache.hadoop.hive.serde2.WriteBuffers.write(WriteBuffers.java:237)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMapStore.addMore(VectorMapJoinFastBytesHashMapStore.java:539)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastBytesHashMap.add(VectorMapJoinFastBytesHashMap.java:101)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringCommon.adaptPutRow(VectorMapJoinFastStringCommon.java:59)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastStringHashMap.putRow(VectorMapJoinFastStringHashMap.java:37)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastTableContainer.putRow(VectorMapJoinFastTableContainer.java:183)
at org.apache.hadoop.hive.ql.exec.vector.mapjoin.fast.VectorMapJoinFastHashTableLoader.load(VectorMapJoinFastHashTableLoader.java:130)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTableInternal(MapJoinOperator.java:344)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.loadHashTable(MapJoinOperator.java:413)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator.lambda$initializeOp$0(MapJoinOperator.java:215)
at org.apache.hadoop.hive.ql.exec.MapJoinOperator$$Lambda$27/55723736.call(Unknown Source)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache.retrieve(ObjectCache.java:96)
at org.apache.hadoop.hive.ql.exec.tez.ObjectCache$1.call(ObjectCache.java:113)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
【问题讨论】:
【参考方案1】:加载 HashTable 时,MapJoin 运算符出现 OOM 异常。也许没有mapjoin的替代路径已经成功,这就是它最终完成的原因。
你可以做什么:尝试增加映射器的并行度,如果你有更多的映射器并且 id 没有帮助,增加映射器的内存。 检查您当前的设置并进行相应更改。
-
增加映射器并行度(如果原因实际上是因为映射连接加载到内存中的表太大,这可能无济于事)。
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set tez.grouping.max-size=32000000; --decreasing max-size increases parallelism
set tez.grouping.min-size=32000; --if you have small files less than min-size, mapper will additionally process other files
-
增加映射器容器大小(检查您当前的设置并相应增加)。这只是示例:
set hive.tez.container.size=2048; --container size in megabytes
set hive.tez.java.opts=-Xmx1700m; --set this 80% of hive.tez.container.size
-
Map 端聚合会导致 OOM,请尝试禁用
set hive.map.aggr=false;
-
检查您的 mapjoin 设置,可能 smalltable 大小设置得太大,与您之前设置的容器大小比较:Hive Map-Join configuration mystery
【讨论】:
以上是关于java.lang.OutOfMemoryError:执行 Hive 查询时出现 Java 堆空间错误的主要内容,如果未能解决你的问题,请参考以下文章