Running Spark Streaming Jobs on a Kerberos-Enabled Cluster

Posted felixzh

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Running Spark Streaming Jobs on a Kerberos-Enabled Cluster相关的知识,希望对你有一定的参考价值。

Use the following steps to run a Spark Streaming job on a Kerberos-enabled cluster.

  1. Select or create a user account to be used as principal.

    This should not be the kafka or spark service account.

  2. Generate a keytab for the user.
  3. Create a Java Authentication and Authorization Service (JAAS) login configuration file: for example, key.conf.
  4. Add configuration settings that specify the user keytab.

    The keytab and configuration files are distributed using YARN local resources. Because they reside in the current directory of the Spark YARN container, you should specify the location as ./v.keytab.

    The following example specifies keytab location ./v.keytab for principal [email protected]:

    KafkaClient {
       com.sun.security.auth.module.Krb5LoginModule required
       useKeyTab=true
       keyTab="./v.keytab"
       storeKey=true
       useTicketCache=false
       serviceName="kafka"
       principal="[email protected]";
    };
  5. In your spark-submit command, pass the JAAS configuration file and keytab as local resource files, using the --filesoption, and specify the JAAS configuration file options to the JVM options specified for the driver and executor:
    spark-submit     --files key.conf#key.conf,v.keytab#v.keytab     --driver-java-options "-Djava.security.auth.login.config=./key.conf"     --conf "spark.executor.extraJavaOptions=-Djava.security.auth.login.config=./key.conf" ...
  6. Pass any relevant Kafka security options to your streaming application.

    For example, the KafkaWordCount example accepts PLAINTEXTSASL as the last option in the command line:

    KafkaWordCount /vagrant/spark-examples.jar c6402:2181 abc ts 1 PLAINTEXTSASL

以上是关于Running Spark Streaming Jobs on a Kerberos-Enabled Cluster的主要内容,如果未能解决你的问题,请参考以下文章

Structured Streaming 实战案例 读取文本数据

Spark Structured Streaming框架之数据输出源详解

Spark 系列(十六)—— Spark Streaming 整合 Kafka

spark streaming kafka example

.Spark Streaming(上)--实时流计算Spark Streaming原理介

Spark Streaming实时流处理项目实战Spark Streaming整合Kafka实战一