spark性能调优

Posted cpyj

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark性能调优相关的知识,希望对你有一定的参考价值。


1、因为在提交spark任务的时候没有指定节点的内存的大小,采用了默认的配置导致发生了以下的异常:
18
/01/18 17:19:56 ERROR TransportClient: Failed to send RPC 8390558468059355291 to YP-TYHJ-APOLLO4200-7234/10.11.122.234:36047: java .nio.channels.ClosedChannelException java.nio.channels.ClosedChannelException 18/01/18 17:19:56 WARN NettyRpcEndpointRef: Error sending message [message = Heartbeat(8,[Lscala.Tuple2;@7d11b814,BlockManagerId(8, YP-TYHJ-APOLLO4200-7231, 39189))] in 1 attempts java.io.IOException: Failed to send RPC 8390558468059355291 to YP-TYHJ-APOLLO4200-7234/10.11.122.234:36047: java.nio.channels.Closed ChannelException at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239) at org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226) at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680) at io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567) at io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424) at io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801) at io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699) at io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122) at io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633) at io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908) at io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960) at io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Caused by: java.nio.channels.ClosedChannelException 18/01/18 17:19:56 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM 18/01/18 17:19:57 INFO DiskBlockManager: Shutdown hook called 18/01/18 17:19:57 INFO ShutdownHookManager: Shutdown hook called

解决方法:

技术分享图片

将driver_executor和executor_memory调大,调节原则:使用的资源有多大,就尽量去调节到最大的大小。

 

2、spark汇聚数据到hbase失败

ERROR Executor: Managed memory leak detected; size = 67371008 bytes, TID = 61301
18/01/16 15:49:00 ERROR Executor: Exception in task 41.8 in stage 120.0 (TID 61301)
org.apache.hadoop.hive.ql.metadata.HiveException: parquet.hadoop.MemoryManager$1: New Memory allocation 1023098 bytes is smaller tha
n the minimum allocation size of 1048576 bytes.
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:249)
        at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.org$apache$spark$sql$hive$SparkHiveDynamicPartitionWri
terContainer$$newWriter$1(hiveWriterContainers.scala:240)
        at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContaine
rs.scala:249)
        at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer$$anonfun$getLocalFileWriter$1.apply(hiveWriterContaine
rs.scala:249)
        at scala.collection.mutable.MapLike$class.getOrElseUpdate(MapLike.scala:189)
        at scala.collection.mutable.AbstractMap.getOrElseUpdate(Map.scala:91)
        at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.getLocalFileWriter(hiveWriterContainers.scala:249)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$
$writeToFile$1$1.apply(InsertIntoHiveTable.scala:112)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$org$apache$spark$sql$hive$execution$InsertIntoHiveTable$
$writeToFile$1$1.apply(InsertIntoHiveTable.scala:104)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable.org$apache$spark$sql$hive$execution$InsertIntoHiveTable$$writeToF
ile$1(InsertIntoHiveTable.scala:104)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
        at org.apache.spark.sql.hive.execution.InsertIntoHiveTable$$anonfun$saveAsHiveFile$3.apply(InsertIntoHiveTable.scala:84)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: parquet.hadoop.MemoryManager$1: New Memory allocation 1023098 bytes is smaller than the minimum allocation size of 104857
6 bytes.
        at parquet.hadoop.MemoryManager.updateAllocation(MemoryManager.java:125)
        at parquet.hadoop.MemoryManager.addWriter(MemoryManager.java:82)
        at parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:104)
        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:303)
        at parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:267)
        at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:65)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.ja
va:125)
        at org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:114)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:261)
        at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:246)
        ... 19 more
18/01/16 15:49:00 WARN MemoryManager: Total allocation exceeds 50.00% (2,147,483,647 bytes) of heap memory
Scaling row group sizes to 0.76% for 2100 writers
18/01/16 15:49:00 WARN TaskMemoryManager: leak 64.3 MB memory from org.apache.spark.unsafe[email protected]

解决办法:

技术分享图片

hive.Context.setConf("parquet.memory.min.chunk.size","100000")

hive.Context.setConf("hive.exec.max.dynamic.partitions","100000")

 



以上是关于spark性能调优的主要内容,如果未能解决你的问题,请参考以下文章

spark性能优化:shuffle调优

spark性能调优

Spark任务性能调优总结

Spark全方位性能调优-- 参数调优与应用调优

转载Apache Spark Jobs 性能调优

Apache Spark Jobs 性能调优