kerberos系列之spark认证配置
Posted bainianminguo
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了kerberos系列之spark认证配置相关的知识,希望对你有一定的参考价值。
大数据安全系列的其它文章
https://www.cnblogs.com/bainianminguo/p/12548076.html-----------安装kerberos
https://www.cnblogs.com/bainianminguo/p/12548334.html-----------hadoop的kerberos认证
https://www.cnblogs.com/bainianminguo/p/12548175.html-----------zookeeper的kerberos认证
https://www.cnblogs.com/bainianminguo/p/12584732.html-----------hive的kerberos认证
https://www.cnblogs.com/bainianminguo/p/12584880.html-----------es的search-guard认证
https://www.cnblogs.com/bainianminguo/p/12639821.html-----------flink的kerberos认证
https://www.cnblogs.com/bainianminguo/p/12639887.html-----------spark的kerberos认证
今天的博客介绍大数据安全系列之spark的kerberos配置
一、spark安装
1、解压和重命名安装目录
364 tar -zxvf spark-2.4.0-bin-hadoop2.7.tgz -C /usr/local/ 365 cd /usr/local/ 366 ll 367 mv spark-2.4.0-bin-hadoop2.7/ spark
2、设置spark的环境变量
export SPARK_HOME=/usr/local/spark export PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin
3、修改spark的env文件
[root@cluster2-host1 conf]# vim spark-env.sh
export JAVA_HOME=/usr/local/java #Java环境变量 export SCALA_HOME=/usr/local/scala #SCALA环境变量 export SPARK_WORKING_MEMORY=1g #每一个worker节点上可用的最大内存 export SPARK_MASTER_IP=cluster1-host1 #驱动器节点IP export HADOOP_HOME=/usr/local/hadoop #Hadoop路径 export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop #Hadoop配置目录
4、修改spark的conf文件
[root@cluster2-host1 conf]# cp spark-defaults.conf.template spark-defaults.conf [root@cluster2-host1 conf]# pwd /usr/local/spark/conf
spark.yarn.jars=hdfs://cluster1-host1:9000/spark_jars/*
5、修改slaver文件
[root@cluster2-host1 conf]# cp slaves.template slaves cluster2-host2 cluster2-host3
6、创建spark在hdfs上的jar包路径
[root@cluster2-host1 conf]# hadoop fs -mkdir /spark_jars [root@cluster2-host1 conf]# hadoop dfs -ls / DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. Found 1 items drwxr-xr-x - root supergroup 0 2020-03-02 04:30 /spark_jars
7、分发安装包到其它节点
8、启动spark
Cd /usr/local/spark/sbin [root@cluster2-host1 sbin]# ./start-all.sh
检查进程
[root@cluster2-host1 sbin]# jps 25922 ResourceManager 31875 Master 6101 Jps 26152 NodeManager 22924 NameNode 23182 DataNode
[root@cluster2-host2 conf]# jps 22595 SecondaryNameNode 29043 Jps 22268 DataNode 24462 NodeManager 27662 Worker
[root@cluster2-host3 ~]# jps 25025 NodeManager 28404 Worker 12537 Jps 22910 DataNode [root@cluster2-host3 ~]#
9、浏览器访问页面
http://10.87.18.34:8080/
二、配置spark的kerberos配置
spark的kerberos不需要配置,只需要保证hdfs的kerberos配置正确即可
保证使用hdfs的用户已经验证,且本地有缓存,或者指定keytab文件也可以
[root@cluster2-host1 bin]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hdfs/cluster2-host1@HADOOP.COM Valid starting Expires Service principal 03/03/2020 08:06:49 03/04/2020 08:06:49 krbtgt/HADOOP.COM@HADOOP.COM renew until 03/10/2020 09:06:49
进行如下的验证,能访问hdfs的数据即可
./spark-shell
scala> var file = "/input/test.txt" file: String = /input/test.txt ^ scala> spark.read.textFile(file).flatMap(_.split(" ")).collect res1: Array[String] = Array(adfaljal, fjalfjalf, falfja, lfajsa, 23fdjalfja, abc, dda, haoop, cluster, cluster)
以上是关于kerberos系列之spark认证配置的主要内容,如果未能解决你的问题,请参考以下文章
Spark 本地连接远程服务器上带有kerberos认证的Hive