spark on yarn详解

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark on yarn详解相关的知识,希望对你有一定的参考价值。

1、参考文档:
spark-1.3.0:http://spark.apache.org/docs/1.3.0/running-on-yarn.html
spark-1.6.0:http://spark.apache.org/docs/1.6.0/running-on-yarn.html

备注:从spark-1.6.0开始,spark on yarn命令有略微改变,具体参考官方文档,这里以spark 1.3.0集群为主。

2、前期准备
编译spark,参看文档:http://www.cnblogs.com/wcwen1990/p/7688027.html
spark安装部署(包括local模式和standalone模式):http://www.cnblogs.com/wcwen1990/p/6889521.html

3、spark on yarn配置:

1)启动hadoop集群:

sbin/hadoop-daemon.sh start namenode
sbin/hadoop-daemon.sh start datanode

sbin/yarn-daemon.sh start resourcemanager
sbin/yarn-daemon.sh start nodemanager

sbin/mr-jobhistory-daemon.sh start historyserver

2)启动spark历史日志服务:

sbin/start-history-server.sh

3)查看进程信息:

$ jps
3182 DataNode
3734 JobHistoryServer
3949 Jps
3555 NodeManager
3295 ResourceManager
3857 HistoryServer
3094 NameNode

4、spark-submit方式提交应用到yarn(提交可以以client模式和cluster模式进行应用提交):

1)spark-1.3.0:

$ ./bin/spark-submit --class path.to.your.Class --master yarn-cluster [options] <app jar> [app options]

For example:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \\
     --master yarn-cluster \\
     --num-executors 3 \\
     --driver-memory 4g \\
     --executor-memory 2g \\
     --executor-cores 1 \\
     --queue thequeue \\
     lib/spark-examples*.jar \\
     10

2)spark-1.6.0:

$ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] <app jar> [app options]

For example:

$ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \\
     --master yarn \\
     --deploy-mode cluster \\
     --driver-memory 4g \\
     --executor-memory 2g \\
     --executor-cores 1 \\
     --queue thequeue \\
     lib/spark-examples*.jar \\
     10

5、spark-shell方式运行在yarn上(spark-shell只能通过client模式运行):

1)spark-1.3.0:

$ ./bin/spark-shell --master yarn-client

2)spark-1.6.0:

$ ./bin/spark-shell --master yarn --deploy-mode client

6、测试,这里以spark-1.3.0为例:

$ ./bin/spark-shell --master yarn-client

spark on yarn模式下运行wordcount程序:

scala> sc.textFile("/user/hadoop/mapreduce/wordcount/input/wc.input").flatMap(_.split(" ")).map((_,1)).reduceByKey(_ + _).map(x => (x._2,x._1)).sortByKey(false).map(x => (x._2,x._1)).collect
... ...
res0: Array[(String, Int)] = Array((scala,1), (hive,1), (oozie,1), (mapreduce,1), (zookeeper,1), (hue,1), (yarn,1), (sqoop,1), (kafka,1), (spark,1), (hadoop,1), (flume,1), (hdfs,1), (storm,1), (hbase,1))

scala> sc.stop

以上程序运行过程可以通过web ui查看详情,具体地址有以下几个:

yarn:http://chavin.king:8088
spark应用监控:http://chavin.king:4040
历史日志服务:http://chavin.king:18080


































以上是关于spark on yarn详解的主要内容,如果未能解决你的问题,请参考以下文章

Spark 在yarn上运行模式详解:cluster模式和client模式

详解Spark运行模式(local+standalone+yarn)

详解Spark运行模式(local+standalone+yarn)

详解Spark运行模式(local+standalone+yarn)

详解Spark运行模式(local+standalone+yarn)

Spark核心编程进阶-yarn模式下日志查看详解