20.Spark
Posted dawn2020
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了20.Spark相关的知识,希望对你有一定的参考价值。
目录
前提:环境变量
vim /etc/profile
#Spark,启动Standalone模式的命令位于sbin,任务提交等命令位于bin
export SPARK_HOME=/usr/local/spark
export PATH=$PATH:$SPARK_HOME/sbin:$SPARK_HOME/bin:
source /etc/profile
一、local模式
1.配置
可直接使用
2.运行
spark-shell
二、Standalone模式
1.配置
1.1 spark-env.sh
cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
vim /usr/local/spark/conf/spark-env.sh
export JAVA_HOME=/usr/local/jdk
export SCALA_HOME=/usr/local/scala
#export HADOOP_HOME=/usr/local/hadoop
#export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
#standalone(HA,JobHistroyServer)
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node1:2181,node2:2181,node3:2181 -Dspark.deploy.zookeeper.dir=/spark"
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=4000 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://node1:9000/spark/logs"
scp.sh /usr/local/spark/conf/spark-env.sh
1.2 slaves
//配置Worker节点信息
cp /usr/local/spark/conf/slaves.template /usr/local/spark/conf/slaves
vim /usr/local/spark/conf/slaves
node1
node2
node3
2.运行
//1.启动当前节点Master与所有节点Worker
/usr/local/spark/sbin/start-all.sh
/usr/local/spark/sbin/stop-all.sh
//2.检测HA
//node2、node3分别启动对应节点Master
/usr/local/spark/sbin/start-master.sh
//访问spark页面,查看status
http://node1:8080/
http://node2:8080/
http://node3:8080/
//选择node1,查看master的进程pid
jps
//结束master进程
kill -9 3658
//访问spark页面,查看status
http://node1:8080/
http://node2:8080/
http://node3:8080/
//3.检测JobHistroyServer
start-history-server.sh
jps
http://node1:4000/ #查看spark-log信息
hdfs dfs -ls /spark/logs
stop-history-server.sh
//4.JAVA_HOME not set
vim /usr/local/spark/sbin/spark-config.sh
export JAVA_HOME=/usr/local/sth/jdk1.8.0_11
spark-submit --master spark://node1:7077 --class org.apache.spark.examples.SparkPi /usr/local/spark/examples/jars/spark-examples_2.11-2.3.0.jar 10
spark-shell --master spark://node1:7077
三、Yarn模式
1.配置
1.1 spark-env.sh
cp /usr/local/spark/conf/spark-env.sh.template /usr/local/spark/conf/spark-env.sh
vim /usr/local/spark/conf/spark-env.sh
export JAVA_HOME=/usr/local/jdk
export SCALA_HOME=/usr/local/scala
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export YARN_CONF_DIR=/usr/local/hadoop/etc/hadoop
scp.sh /usr/local/spark/conf/spark-env.sh
1.2 yarn-site.xml(yarn)
vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
<!-- 防止虚拟机内存过小而导致启动失败 -->
<property>
<!--是否启动一个线程检查每个任务正使用的物理内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
scp.sh /usr/local/hadoop/etc/hadoop/yarn-site.xml
1.3 spark-defaults.conf(yarn,该文件定义了spark的默认任务提交端口)
//配置日志服务存储
cd /usr/local/spark/conf
ll
cp spark-defaults.conf.template spark-defaults.conf
vim /usr/local/spark/conf/spark-defaults.conf
spark.yarn.historyServer.address=node2:18080
spark.history.ui.port=18080
2.运行
zkstart.sh
start-all.sh
jps.sh
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client /usr/local/spark/examples/jars/spark-examples_2.11-2.4.0.jar 10
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster /usr/local/spark/examples/jars/spark-examples_2.11-2.4.0.jar 10
//结果见node1:8088
node3节点启动spark-shell
spark-shell --master yarn
http://node3:4040/
scala> :quit
yarn application -list
Could not find proxy-user cookie, so user will not be set
node3节点启动spark-shell
spark-shell --master yarn --deploy-mode client
http://node3:4040/
scala> :quit
node3节点启动spark-shell
spark-shell --master yarn --deploy-mode cluster
http://node3:4040/
scala> :quit
以上是关于20.Spark的主要内容,如果未能解决你的问题,请参考以下文章