Spark 2.4 集群部署(on Yarn模式)

Posted 小峰伊凡

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark 2.4 集群部署(on Yarn模式)相关的知识,希望对你有一定的参考价值。

基本信息


系统版本:3.10.0-1062.9.1.el7.x86_64

JDK 版本:1.8.0_202


Hadoop:https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gzScala:https://github.com/scala/scala/releases/tag/v2.12.9 页面下下载tar.gz的包Spark:https://mirrors.aliyun.com/apache/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz

Step 1:修改主机名称

hostnamectl set-hostname node01hostnamectl set-hostname node02hostnamectl set-hostname node03

Step 2:修改 hosts 文件

vim /etc/hosts

192.168.24.2 node01192.168.24.4 node02192.168.24.6 node03

Step 3:关闭防火墙,并禁止启动

systemctl stop firewalldsystemctl disable firewalld

Step 4:关闭 SELINUX

vim /etc/selinux/config将 SELINUX 配置项改为 SELINUX=disabled

Step 5:SSH 免密登录设置(所有节点)

ssh-keygen -t rsacat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keyschmod 700 ~/.sshchmod 600 ~/.ssh/authorized_keys

# 将 node01 的公钥 copy 到其他节点,仅在 node01 上执行即可

ssh-copy-id node02ssh-copy-id node03

Step 6:创建目录

mkdir -pv /home/hadoopmkdir -pv /home/hadoop/work/tmp/dfs/namemkdir -pv /home/hadoop/work/tmp/dfs/datamkdir -pv /home/spark

Step 7:下载 hadoop tar 包并解压

cd /home/hadoopwget https://mirrors.aliyun.com/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gztar zxvf hadoop-2.7.7.tar.gz

Step 8:进入目录 hadoop-2.7.7/etc/hadoop,依次编辑 hadoop-env.sh、mapred-env.sh、yarn-env.sh 这三个配置文件,确保内容中 JAVA_HOME 配置为正确路径,如下:

export JAVA_HOME=/usr/java/jdk1.8.0_202-amd64

Step 9:编辑 core-site.xml 文件,配置 configuration 节点,如下:

<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://node01:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <value>/home/hadoop/work/tmp</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file://${hadoop.tmp.dir}/dfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file://${hadoop.tmp.dir}/dfs/data</value> </property></configuration>

Step 10:编辑 hdfs-site.xml 文件,配置 configuration节点,将  node02 配置成 Sendary Namenode,如下:

<configuration> <property> <name>dfs.namenode.secondary.http-address</name> <value>node02:50090</value> </property></configuration>

Step 11:编辑 slaves 文件,如下:

node01node02node03

Step 12:编辑 yarn-site.xml 文件,配置 configuration 节点,如下:

<configuration><!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>node01</value> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>106800</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property></configuration>

Step 13:备份mapred-site.xml.template文件,将mapred-site.xml.template 重命名为 mapred-site.xml

cp mapred-site.xml.template mapred-site.xml.template.bakmv mapred-site.xml.template mapred-site.xml

Step 14:编辑 mapred-site.xml 文件,配置 configuration 节点,如下:

<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>node01:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>node01:19888</value> </property></configuration>

Step 15:同步 hadoop-2.7.7 目前到其他节点

scp -r hadoop-2.7.7 node02:/home/hadoopscp -r hadoop-2.7.7 node03:/home/hadoop

Step 16:格式化 HDFS

cd /home/hadoop/hadoop-2.7.7bin/hdfs namenode -format

Step 17:启动 Hadoop

# 启动hdfs
sh /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh#启动yarn
sh /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh#启动日志服务
/home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver

Step 18:下载 scala、spark 包并解压

cd /home/sparkhttps://github.com/scala/scala/releases/tag/v2.12.9 页面下载scala并将包上传至服务器 /home/spark目录;wget https://mirrors.aliyun.com/apache/spark/spark-2.4.7/spark-2.4.7-bin-hadoop2.7.tgz;
tar -zxvf scala-2.12.9.tar.gztar -zxvf spark-2.4.7-bin-hadoop2.7.tgz

Step 19:编辑 spark 相关配置文件

cd spark-2.4.7-bin-hadoop2.7/confcp spark-env.sh.template spark-env.sh.template.bakmv spark-env.sh.template spark-env.sh

vim spark-env.sh 追加:

export SPARK_MASTER_IP=node01export SPARK_MASTER_PORT=7077export SPARK_EXECUTOR_INSTANCES=1export SPARK_WORKER_INSTANCES=1export SPARK_WORKER_CORES=1export SPARK_WORKER_MEMORY=256Mexport SPARK_MASTER_WEBUI_PORT=8080export SPARK_CONF_DIR=/home/spark/spark-2.4.7-bin-hadoop2.7/confexport JAVA_HOME=/usr/java/jdk1.8.0_202-amd64export JRE_HOME=${JAVA_HOME}/jreexport HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.7/etc/hadoop
mv slaves.template slaves

vim slaves 

node01node02node03

Step 20:将 scala-2.12.9、spark-2.4.7-bin-hadoop2.7 目录同步到其他节点;

scp -r scala-2.12.9 node02:/home/sparkscp -r scala-2.12.9 node03:/home/sparkscp -r spark-2.4.7-bin-hadoop2.7 node02:/home/sparkscp -r spark-2.4.7-bin-hadoop2.7 node03:/home/spark

Step 21:启动 Spark

sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/start-all.sh

Step 22:配置环境变量,并使之立即生效

#JAVAexport JAVA_HOME=/usr/java/jdk1.8.0_202-amd64export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport PATH=${JAVA_HOME}/bin:$PATH#SCALAexport SCALA_HOME=/home/spark/scala-2.12.9export PATH=${SCALA_HOME}/bin:$PATH#SPARKexport SPARK_HOME=/home/spark/spark-2.4.7-bin-hadoop2.7export PATH=${SPARK_HOME}/bin:$PATH#HADOOPexport HADOOP_HOME=/home/hadoop/hadoop-2.7.7export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATHexport HADOOP_CONF_DIR=/home/hadoop/hadoop-2.7.7/etc/hadoop
source /etc/profile

Step 23:简单配置一键启动脚本和关闭脚本方便启动与关闭

start-spark.sh #!/bin/bash
sh /home/hadoop/hadoop-2.7.7/sbin/start-dfs.sh \&& sh /home/hadoop/hadoop-2.7.7/sbin/start-yarn.sh \&& /home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh start historyserver \&& sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/start-all.sh
stop-spark.sh #!/bin/bash
sh /home/spark/spark-2.4.7-bin-hadoop2.7/sbin/stop-all.sh \&& /home/hadoop/hadoop-2.7.7/sbin/mr-jobhistory-daemon.sh stop historyserver \&& sh /home/hadoop/hadoop-2.7.7/sbin/stop-yarn.sh \&& sh /home/hadoop/hadoop-2.7.7/sbin/stop-dfs.sh

Step 24:访问URL:http://192.168.24.2:8080


以上是关于Spark 2.4 集群部署(on Yarn模式)的主要内容,如果未能解决你的问题,请参考以下文章

Spark On YARN 分布式集群安装

Spark On Yarn部署

Spark基础学习笔记06:搭建Spark On YARN模式的集群

配置Spark on YARN集群内存

Spark on yarn遇到的问题

Spark on YARN的部署