Running Spark Streaming Jobs on a Kerberos-Enabled Cluster
Posted felixzh
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Running Spark Streaming Jobs on a Kerberos-Enabled Cluster相关的知识,希望对你有一定的参考价值。
Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.
- Select or create a user account to be used as principal.
This should not be the
kafka
orspark
service account. - Generate a keytab for the user.
- Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example,
key.conf
. - Add configuration settings that specify the user keytab.
The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as
./v.keytab
.The following example specifies keytab location
./v.keytab
for principal[email protected]
:KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true keyTab="./v.keytab" storeKey=true useTicketCache=false serviceName="kafka" principal="[email protected]"; };
- In your
spark-submit
command, pass the JAAS configuration file and keytab as local resource files, using the--files
option, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:spark-submit --files key.conf#key.conf,v.keytab#v.keytab --driver-java-options "-Djava.security.auth.login.config=./key.conf" --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" ...
- Pass any relevant Kafka security options to your streaming application.
For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:
KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL
以上是关于Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的主要内容,如果未能解决你的问题,请参考以下文章
Structured Streaming 实战案例 读取文本数据
Spark Structured Streaming框架之数据输出源详解
Spark 系列(十六)—— Spark Streaming 整合 Kafka