20.Spark

Posted dawn2020

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了20.Spark相关的知识,希望对你有一定的参考价值。

前提:环境变量

vim /etc/profile

#Spark,启动Standalone模式的命令位于sbin,任务提交等命令位于bin
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin:

source /etc/profile

一、local模式

1.配置

可直接使用

2.运行

spark-shell

二、Standalone模式

1.配置

1.1 spark-env.sh

cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
vim /usr/local/spark/conf/spark-env.sh

export JAVA_HOME=/usr/local/jdk
export SCALA_HOME=/usr/local/scala
#export HADOOP_HOME=/usr/local/hadoop
#export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop

#standalone(HA,JobHistroyServer)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://node1:9000/spark/logs"

scp.sh /usr/local/spark/conf/spark-env.sh

1.2 slaves

//配置Worker节点信息
cp /usr/local/spark/conf/slaves.template /usr/local/spark/conf/slaves
vim /usr/local/spark/conf/slaves
node1
node2
node3

2.运行

//1.启动当前节点Master与所有节点Worker
/usr/local/spark/sbin/start-all.sh
/usr/local/spark/sbin/stop-all.sh


//2.检测HA
//node2、node3分别启动对应节点Master
/usr/local/spark/sbin/start-master.sh

//访问spark页面,查看status
http://node1:8080/
http://node2:8080/
http://node3:8080/

//选择node1,查看master的进程pid
jps
//结束master进程
kill -9 3658

//访问spark页面,查看status
http://node1:8080/
http://node2:8080/
http://node3:8080/


//3.检测JobHistroyServer
start-history-server.sh
jps
http://node1:4000/ #查看spark-log信息
hdfs dfs -ls /spark/logs
stop-history-server.sh



//4.JAVA_HOME not set
vim /usr/local/spark/sbin/spark-config.sh
export JAVA_HOME=/usr/local/sth/jdk1.8.0_11
spark-submit --master spark://node1:7077 --class org.apache.spark.examples.SparkPi /usr/local/spark/examples/jars/spark-examples_2.11-2.3.0.jar 10

spark-shell --master spark://node1:7077

三、Yarn模式

1.配置

1.1 spark-env.sh

cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
vim /usr/local/spark/conf/spark-env.sh

export JAVA_HOME=/usr/local/jdk
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop

scp.sh /usr/local/spark/conf/spark-env.sh

1.2 yarn-site.xml(yarn)

vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<!-- 防止虚拟机内存过小而导致启动失败 -->
<property>
    <!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>
<property>
    <!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

scp.sh /usr/local/hadoop/etc/hadoop/yarn-site.xml

1.3 spark-defaults.conf(yarn,该文件定义了spark的默认任务提交端口)

//配置日志服务存储
cd /usr/local/spark/conf
ll
cp spark-defaults.conf.template spark-defaults.conf
vim /usr/local/spark/conf/spark-defaults.conf
spark.yarn.historyServer.address=node2:18080
spark.history.ui.port=18080

2.运行

zkstart.sh
start-all.sh
jps.sh

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client /usr/local/spark/examples/jars/spark-examples_2.11-2.4.0.jar 10

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster /usr/local/spark/examples/jars/spark-examples_2.11-2.4.0.jar 10
//结果见node1:8088


node3节点启动spark-shell
spark-shell --master yarn
http://node3:4040/
scala> :quit
yarn application -list

Could not find proxy-user cookie, so user will not be set



node3节点启动spark-shell
spark-shell --master yarn --deploy-mode client
http://node3:4040/
scala> :quit

node3节点启动spark-shell
spark-shell --master yarn --deploy-mode cluster
http://node3:4040/
scala> :quit

以上是关于20.Spark的主要内容,如果未能解决你的问题,请参考以下文章

微信小程序代码片段

VSCode自定义代码片段——CSS选择器

谷歌浏览器调试jsp 引入代码片段,如何调试代码片段中的js

片段和活动之间的核心区别是啥?哪些代码可以写成片段?

VSCode自定义代码片段——.vue文件的模板

VSCode自定义代码片段6——CSS选择器