SparkContext:运行 Spark 作业时初始化 SparkContext 时出错

Posted

技术标签:

【中文标题】SparkContext:运行 Spark 作业时初始化 SparkContext 时出错【英文标题】:SparkContext: Error initializing SparkContext While Running Spark Job 【发布时间】:2021-02-05 12:37:42 【问题描述】:

我正在做一个将数据从 Elastic Search 加载到 HDFS 的 Spark 程序,但我在初始化 SparkContext 时遇到错误。运行作业时出错。错误是在制作火花会话期间。

Hadoop:3.2.1

火花:2.4.4

Elasticsearch Spark(适用于 Spark 2.X)» 7.5.1

电子病历:6.0.0

代码:

from pyspark.sql import SparkSession
from pyspark.sql.functions import col, to_date

spark = SparkSession \
   .builder \
   .getOrCreate()

错误日志:

20/10/22 10:09:12 ERROR SparkContext: Error initializing SparkContext.
java.util.ServiceConfigurationError: org.apache.spark.deploy.yarn.security.ServiceCredentialProvider: Provider org.elasticsearch.spark.deploy.yarn.security.EsServiceCredentialProvider could not be instantiated
    at java.util.ServiceLoader.fail(ServiceLoader.java:232)
    at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
    at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
    at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
    at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
    at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:184)
    at scala.collection.mutable.ListBuffer.$plus$plus$eq(ListBuffer.scala:47)
    at scala.collection.TraversableLike.to(TraversableLike.scala:678)
    at scala.collection.TraversableLike.to$(TraversableLike.scala:675)
    at scala.collection.AbstractTraversable.to(Traversable.scala:108)
    at scala.collection.TraversableOnce.toList(TraversableOnce.scala:299)
    at scala.collection.TraversableOnce.toList$(TraversableOnce.scala:299)
    at scala.collection.AbstractTraversable.toList(Traversable.scala:108)
    at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.loadCredentialProviders(YARNHadoopDelegationTokenManager.scala:82)
    at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.getCredentialProviders(YARNHadoopDelegationTokenManager.scala:73)
    at org.apache.spark.deploy.yarn.security.YARNHadoopDelegationTokenManager.<init>(YARNHadoopDelegationTokenManager.scala:46)
    at org.apache.spark.deploy.yarn.Client.setupSecurityToken(Client.scala:308)
    at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:1013)
    at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:178)
    at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:57)
    at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:183)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:501)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:238)
    at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
    at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/deploy/yarn/security/ServiceCredentialProvider$class
    at org.elasticsearch.spark.deploy.yarn.security.EsServiceCredentialProvider.<init>(EsServiceCredentialProvider.scala:63)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
    ... 40 more
Caused by: java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.security.ServiceCredentialProvider$class
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
    ... 47 more
20/10/22 10:09:12 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-1-155.us-east-2.test:4040
20/10/22 10:09:12 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
20/10/22 10:09:12 INFO YarnClientSchedulerBackend: Stopped
20/10/22 10:09:12 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/10/22 10:09:12 INFO MemoryStore: MemoryStore cleared
20/10/22 10:09:12 INFO BlockManager: BlockManager stopped
20/10/22 10:09:12 INFO BlockManagerMaster: BlockManagerMaster stopped
20/10/22 10:09:12 WARN MetricsSystem: Stopping a MetricsSystem that is not running
20/10/22 10:09:12 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/10/22 10:09:12 INFO SparkContext: Successfully stopped SparkContext
20/10/22 10:09:13 INFO ShutdownHookManager: Shutdown hook called
20/10/22 10:09:13 INFO ShutdownHookManager: Deleting directory /vol1/tmp/spark-b39bb8cc-5bc7-4721-89bd-8bd62b9e527e
20/10/22 10:09:13 INFO ShutdownHookManager: Deleting directory /vol1/tmp/spark-d94995f0-05b6-476f-935e-8ba501acbed3

    at com.company.utils.ResourceScriptUtils.executeScript(ResourceScriptUtils.java:114)
    at com.company.utils.ResourceScriptUtils.executeScript(ResourceScriptUtils.java:135)
    at com.company.loader.impl.realTimeProcessing.RealTimeEsLoader.processJob(RealTimeEsLoader.java:232)
    at com.company.loader.App.main(App.java:37)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
org.apache.commons.exec.ExecuteException: Process exited with an error: 1 (Exit value: 1)
    at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
    at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
    at com.company.multijob.MultiJob$HadoopJob.call(MultiJob.java:50)
    at com.company.multijob.MultiJob$HadoopJob.call(MultiJob.java:38)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

有人可以帮忙吗?谢谢。

【问题讨论】:

【参考方案1】:

这是因为 Spark 应用程序中缺少 Spark-Yarn JAR。 如果您使用的是 maven,请在 pom.xml 中添加以下内容。

    依赖项下:
   <dependency>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-yarn_2.11</artifactId>
     <version>2.4.7</version>
   </dependency>
    在 artifactItems 下:
   <artifactItem>
     <groupId>org.apache.spark</groupId>
     <artifactId>spark-yarn_2.11</artifactId>
     <version>2.4.7</version>
     <type>jar</type>
     <overWrite>false</overWrite>
<outputDirectory>$project.build.directory/classes/</outputDirectory>
     <destFileName>optional-new-name.jar</destFileName>
   </artifactItem>

注意:相应地更改您的 scala 和 spark 版本,因为这里我使用的是 Spark 2.4.7 和 Scala 2.11。

【讨论】:

以上是关于SparkContext:运行 Spark 作业时初始化 SparkContext 时出错的主要内容,如果未能解决你的问题,请参考以下文章

Spark源码剖析——SparkContext

Spark源码剖析——SparkContext

使用 spark.sparkContext.addPyFile 导入 Pandas UDF

spark作业调优-------合理分配资源

Spark 源码阅读——任务提交过程

Spark 报错解决--Error initializing SparkContext