Spark Structured Streaming error读取字段'topic_metadata'时出错

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark Structured Streaming error读取字段'topic_metadata'时出错相关的知识,希望对你有一定的参考价值。

我试图运行一个非常简单的例子。我有一个卡夫卡readStream从卡夫卡主题读取。我正在运行spark 2.4.0和Kafka 0.10.2

var streamingInputDF =
  spark.readStream
    .format("kafka")
    .option("kafka.bootstrap.servers", "localhost:9092")
    .option("subscribe", "twitter-topic")
    .load()

控制台writeStream

val activityQuery = streamingInputDF.writeStream
  .format("console")
  .outputMode("append")
  .start()

activityQuery.awaitTermination()

但是,当我启动控制台writeStream时,我得到以下异常

org.apache.spark.sql.streaming.StreamingQueryException: Query [id = d21cd9b4-7f51-4f5f-acbf-943dfaaeb7e5, runId = c2b2c58d-7afe-4ca5-bc36-6a3f496c19b3] terminated with exception: Error reading field 'topic_metadata': Error reading array of size 881783, only 41 bytes available
  at org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runStream(StreamExecution.scala:295)
  at org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:189)
Caused by: org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 881783, only 41 bytes available
  at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73)
  at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380)
  at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449)
  at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
  at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
答案

我将kafka-clients-0.10.2.2.jar添加到spark-submit命令行,这个错误就消失了

以上是关于Spark Structured Streaming error读取字段'topic_metadata'时出错的主要内容,如果未能解决你的问题,请参考以下文章

Spark Structured Streaming

Spark Structured Streaming

Spark Structured Streaming - 1

删除由 spark-structured-streaming 写入的损坏的 parquet 文件时,我会丢失数据吗?

无法使用Spark Structured Streaming在Parquet文件中写入数据

如何使用Spark Structured Streaming连续监视目录