如何解决 ERROR Executor - 阶段 20.0 (TID 20) 中任务 0.0 中的异常?

Posted

技术标签:

【中文标题】如何解决 ERROR Executor - 阶段 20.0 (TID 20) 中任务 0.0 中的异常?【英文标题】:How to solve ERROR Executor - Exception in task 0.0 in stage 20.0 (TID 20)? 【发布时间】:2020-04-18 09:05:00 【问题描述】:

我知道已经简要回答了类似的问题,但由于缺乏最低声誉,我无法在那里添加我个人的额外疑问......因此我在这里问它

我想使用 Apache Spark + Kafka 处理 Twitter 数据。我为此创建了一个模式。但是当我运行它时,我收到以下错误。我搜索了很多关于这个错误的地方,但我无法得到我想要的解决方案,或者它不起作用。上次我用较小的内存空间运行 Spark,认为内存不足,但我仍然得到同样的错误。这是我收到此错误的代码:

from kafka import KafkaConsumer
from pyspark.streaming import StreamingContext
import json
import pandas as pd
from pyspark import SparkConf,SparkContext
from pyspark.streaming.kafka import KafkaUtils

#cd /opt/hadoop-3.2.0-7/hadoop/spark      $sudo ./bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.3.0  /opt/twitterConsumer.py

conf = SparkConf()
conf.setAppName("BDA-Twitter-Spark-Kafka")
sc = SparkContext(conf=conf)
sc.setLogLevel("ERROR")
ssc = StreamingContext(sc,1)

KafkaStream = KafkaUtils.createStream(ssc, "localhost:2181",'tks',"xmas":1)   # directKafkaStream = KafkaUtils.createDirectStream(ssc, [topic], "metadata.broker.list": brokers)
KafkaStream.pprint()

print("HERE1")


ssc.start()
ssc.awaitTermination()

我的错误是:

    ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.AbstractMethodError
    at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:99)
    at org.apache.spark.streaming.kafka.KafkaReceiver.initializeLogIfNecessary(KafkaInputDStream.scala:68)
    at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
    at org.apache.spark.streaming.kafka.KafkaReceiver.log(KafkaInputDStream.scala:68)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.streaming.kafka.KafkaReceiver.logInfo(KafkaInputDStream.scala:68)
    at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:90)
    at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
    at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
    at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:601)
    at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:591)
    at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)
    at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
19/12/29 09:57:49 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
19/12/29 09:57:49 ERROR ReceiverTracker: Receiver has been stopped. Try to restart it.
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): java.lang.AbstractMethodError
    at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:99)
    at org.apache.spark.streaming.kafka.KafkaReceiver.initializeLogIfNecessary(KafkaInputDStream.scala:68)
    at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
    at org.apache.spark.streaming.kafka.KafkaReceiver.log(KafkaInputDStream.scala:68)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.streaming.kafka.KafkaReceiver.logInfo(KafkaInputDStream.scala:68)
    at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:90)
    at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
    at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
    at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:601)
    at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:591)
    at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)
    at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1877)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1876)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:926)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:926)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2110)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2059)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2048)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
Caused by: java.lang.AbstractMethodError
    at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:99)
    at org.apache.spark.streaming.kafka.KafkaReceiver.initializeLogIfNecessary(KafkaInputDStream.scala:68)
    at org.apache.spark.internal.Logging$class.log(Logging.scala:46)
    at org.apache.spark.streaming.kafka.KafkaReceiver.log(KafkaInputDStream.scala:68)
    at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
    at org.apache.spark.streaming.kafka.KafkaReceiver.logInfo(KafkaInputDStream.scala:68)
    at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:90)
    at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
    at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)
    at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:601)
    at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:591)
    at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)
    at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:121)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

如何在此处匹配所有必需工具的版本?

【问题讨论】:

1) 安装了什么版本的 Java 和 Scala? 2)您是否尝试过使用结构化流媒体? 3) 你需要火花吗? Kafka Connect 在 Twitter 上运行,您可以删除 from kafka import KafkaConsumer,因为这只是普通的 Kafka Python 库 1.版本如下: Hadoop 3.2.0 kafka_2.12-2.3.0.jar spark 2.3.1 java openjdk 版本“11.0.5” 2. 实际上我想对数据流进行实时分析。 3.我做这个项目是为了学习……学习卡夫卡是我的学期作业……我也需要学习火花……所以我想把它们合二为一……@cricket_007 【参考方案1】:

您看到的错误可能来自版本不匹配

Hadoop 和 Spark 需要 Java 8

您正在使用“Kafka with Scala 2.12”(Maven:kafka_2.12),因此您的包也必须使用 Scala 2.12(Maven:spark-xyz_2.12),并且还必须匹配 您的 Spark 版本 (2.3.1)。您的命令显示您已经为 Spark 2.3.0 提取了 Scala 2.11 的 Kafka 流包。另请注意,Spark Streaming 包已弃用,您应该改用 spark-sql-kafka,Structured Streaming

您仍然可以在没有 Spark 和 Hadoop 的情况下进行实时分析

【讨论】:

对最后一行很感兴趣。过去一周我想知道是否真的需要 Spark?还是我们只适合卡夫卡?我们打算建立一个可以扩展到数百个摄像机的实时视频流处理管道。有没有cmets? @ParasJain 对于大型 TimeSeries 聚合,您仍然需要一个数据库,例如 Couchbase、Druid 或 Clickhouse

以上是关于如何解决 ERROR Executor - 阶段 20.0 (TID 20) 中任务 0.0 中的异常?的主要内容,如果未能解决你的问题,请参考以下文章

spark执行在yarn上executor内存不足异常ERROR YarnScheduler: Lost executor 542 on host-bigdata3: Container marked

整合xxl-job后,项目启动报 ERROR c.x.job.core.executor.XxlJobExecutor - null

storm单机运行报错 ERROR backtype.storm.daemon.executor -

ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL TERM

大数据:Spark ShuffleShuffleWrite:Executor如何将Shuffle的结果进行归并写到数据文件中去

ERROR Executor: Exception in task 0.0 in stage 91.0