spark 相关配置 shuffle 相关配置选项

Posted 流浪在伯纳乌

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark 相关配置 shuffle 相关配置选项相关的知识,希望对你有一定的参考价值。

在master的/conf/spark-defaults.conf中配置

spark.shuffle.service.enabled true

spark.shuffle.service.port 7337

但是在从节点的spark-defaults.conf中注释上面两个配置选项,不然web 界面中将看不到从节点

 

spark-defaults.conf:

spark.local.dir /mnt/diskb/sparklocal,/mnt/diskc/sparklocal,/mnt/diskd/sparklocal,/mnt/diske/sparklocal,/mnt/diskf/sparklocal,/mnt/diskg/sparklocal //shuffle 中产生的临时文件的路径
spark.eventLog.enabled true                               //记录spark日志
spark.eventLog.dir hdfs://nameservice1/spark-log  //日志保存在hdfs上
spark.network.timeout 450

spark.dynamicAllocation.enabled true

spark.dynamicAllocation.minExecutors 8

spark.dynamicAllocation.maxExecutors 30
spark.dynamicAllocation.schedulerBacklogTimeout 1s
spark.dynamicAllocation.sustainedSchedulerBacklogTimeout 5s

spark.io.compression.codec snappy

 

spark-env.sh:

export JAVA_HOME=/usr/java/jdk1.7.0_67-cloudera
export SPARK_MASTER_IP=10.130.2.20
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=12
export SPARK_EXECUTOR_INSTANCES=1
export SPARK_WORKER_MEMORY=48g
export SPARK_WORKER_DIR=/mnt/diskb/sparkwork,/mnt/diskc/sparkwork,/mnt/diskd/sparkwork,/mnt/diske/sparkwork,/mnt/diskf/sparkwork,/mnt/diskg/sparkwork
export SPARK_LOCAL_DIRS=/mnt/diske/sparklocal,/mnt/diskb/sparklocal,/mnt/diskc/sparklocal,/mnt/diskd/sparklocal,/mnt/diskf/sparklocal,/mnt/diskg/sparklocal
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf/
export SPARK_DAEMON_MEMORY=12g
#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=bdc40.hexun.com:2181,bdc41.hexun.com:2181,bdc46.hexun.com:2181,bdc53.hexun.com:2181,bdc54.hexun.com:2181 -Dspark.deploy.zookeeper.dir=/spark"
#export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=FILESYSTEM -Dspark.deploy.recoveryDirectory=/opt/modules/spark/recovery"
export JAVA_LIBRARY_PATH=$JAVA_LIBRARY_PATH:$HADOOP_HOME/lib/native
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_LIBRARY_PATH=$SPARK_LIBRARY_PATH:$HADOOP_HOME/lib/native
export SPARK_CLASSPATH=$SPARK_CLASSPATH:$HADOOP_HOME/lib/snappy-java-1.0.4.1.jar

以上是关于spark 相关配置 shuffle 相关配置选项的主要内容,如果未能解决你的问题,请参考以下文章

spark shuffle过程详解,相关优化

Spark shuffle 相关参数调优

spark shuffle:分区原理及相关的疑问

[Spark性能调优] Spark Shuffle 中 JVM 内存使用及配置详情

Spark参数调优

Spark Shuffle 中 JVM 内存使用及配置内幕详情