Spark01 hadoop&spark环境安装

Posted 山高月更阔 Andy

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark01 hadoop&spark环境安装相关的知识,希望对你有一定的参考价值。

hadoop 安装

基于mac os

创建hadoop账号 我用登录电脑的账号启动 这一步略

配置ssh

cd ~/.ssh/                     # 若没有该目录,请先执行一次ssh localhost
ssh-keygen -t rsa # 会有提示,都按回车就可以
cat ./id_rsa.pub >> ./authorized_keys # 加入授权

如果没有ssh 需要先安装ssh
成功标志

ssh localhos #能直接登录

下载hadoop

cd /usr/local/
sudo tar -zxvf ~/Downloads/hadoop-2.7.7.tar.gz #使用sudo 是为了权限

伪分布式配置

配置core-site.xml

cd /usr/local/hadoop-2.7.7
vim etc/hadoop/core-site.xml

添加如下配置

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>file:xxxx/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>false</value>
<description>Should native hadoop libraries, if present, be used.</description>
</property>
</configuration>

配置环境变量

export HADOOP_HOME=/usr/local/hadoop-2.7.7
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=/usr/local/hadoop-2.7.7/lib"
export PATH=$PATH:$HADOOP_HOME/bin

同时需要配置java的环境变量

export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home
export PATH=$PATH:$JAVA_HOME/bin

需要配置JAVA_HOME 路劲不然hadoop找不到java路劲
注意配置环境变量时目录的路劲根据自己安装路劲配置

集群中提示java_home找不到

修改 /usr/local/hadoop-2.7.7/etc/hadoop/hadoop-env.sh

export JAVA_HOME=${JAVA_HOME}
#改成绝对路径
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_211.jdk/Contents/Home

或者
在/usr/local/hadoop-2.7.7/etc/hadoop/hadoop-env.sh 加入

. /etc/profile #加载本地环境变量

启动hadoop

/usr/local/hadoop-2.7.7/start-all.sh

通过start-all.sh 可启动dfs 和 yarn

启动报错

日志文件无权限

/usr/local/hadoop-2.7.7/ 目录下新建logs文件 并设置成启动hadoop的用户有权限

spark 安装

下载

下载 spark http://spark.apache.org/downloads.html
由于我spark 和 hadoop 单独安装 所以选择 with user provider hadoop版本

image.png

解压

cd /usr/local/
sudo tar -zxvf ~/Downloads/spark-2.4.5-bin-without-hadoop.tgz

配置

cd /usr/local/spark-2.4.5-bin-without-hadoop/conf
cp spark-env.sh.template spark-env.sh

添加配置

#设置 hadoop classpath
export SPARK_DIST_CLASSPATH=$(hadoop classpath)

#配置hadoop yarn
export YARN_CONF_DIR=$HADOOP_HOME
export PYSPARK_PYTHON=/Users/pangxianhai/opt/anaconda3/bin/python3.7

如果使用python环境开发,注意python环境配置,不然用系统默认python环境 可能出现找不到三方库,并且可能出现python版本不一致的问题
注意 我的spark版本 不能用python3.8版本
使用python3.8版本报错 如下

Traceback (most recent call last):
File "/usr/local/spark-2.4.5-bin-without-hadoop/examples/src/main/python/pi.py", line 24, in <module>
from pyspark.sql import SparkSession
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>
File "/usr/local/spark-2.4.5-bin-without-hadoop/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
TypeError: an integer is required (got type bytes)

配置环境变量

export SPARK_HOME=/usr/local/spark-2.4.5-bin-without-hadoop
export PATH=$SPARK_HOME/bin:$PATH

配置环境变量方便使用命令

运行实例

cd /usr/local/spark-2.4.5-bin-without-hadoop/
bin/run-example SparkPi

运行结果


image.png

运行程序

 spark-submit --master local --class org.apache.spark.examples.SparkPi  ./examples/jars/spark-examples_2.11-2.4.5.jar

Master URL可以是以下任一种形式:

  • local 使用一个Worker线程本地化运行SPARK(完全不并行)

  • local[*] 使用逻辑CPU个数数量的线程来本地化运行Spark

  • local[K] 使用K个Worker线程本地化运行Spark(理想情况下,K应该根据运行机器的CPU核数设定)

  • spark://HOST:PORT 连接到指定的Spark standalone master。默认端口是7077.

  • yarn-client 以客户端模式连接YARN集群。集群的位置可以在HADOOP_CONF_DIR 环境变量中找到。

  • yarn-cluster 以集群模式连接YARN集群。集群的位置可以在HADOOP_CONF_DIR 环境变量中找到。

  • mesos://HOST:PORT 连接到指定的Mesos集群。默认接口是5050。

运行python

spark-submit --master local examples/src/main/python/pi.py

以上是关于Spark01 hadoop&spark环境安装的主要内容,如果未能解决你的问题,请参考以下文章

Hadoop & Spark

原创 Hadoop&Spark 动手实践 11Spark Streaming 应用与动手实践

原创 Hadoop&Spark 动手实践 8Spark 应用经验调优与动手实践

原创 Hadoop&Spark 动手实践 5Spark 基础入门,集群搭建以及Spark Shell

原创 Hadoop&Spark 动手实践 9Spark SQL 程序设计基础与动手实践(上)

原创 Hadoop&Spark 动手实践 10Spark SQL 程序设计基础与动手实践(下)