intellij 中 spark scala 应用程序中的线程“main”java.lang.NoClassDefFoundError:org/apache/spark/sql/catalyst/St

Posted

技术标签:

【中文标题】intellij 中 spark scala 应用程序中的线程“main”java.lang.NoClassDefFoundError:org/apache/spark/sql/catalyst/StructFilters 中的异常【英文标题】:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/StructFilters in spark scala application in intellij 【发布时间】:2021-06-24 13:30:51 【问题描述】:

我的 Pom.xml

 <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>$scala.version</version>
    </dependency>
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.4</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs</groupId>
      <artifactId>specs</artifactId>
      <version>1.2.5</version>
      <scope>test</scope>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.12</artifactId>
      <version>3.0.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.12</artifactId>
      <version>3.0.1</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.databricks/spark-xml -->
    <dependency>
      <groupId>com.databricks</groupId>
      <artifactId>spark-xml_2.12</artifactId>
      <version>0.10.0</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
    <dependency>
      <groupId>mysql</groupId>
      <artifactId>mysql-connector-java</artifactId>
      <version>5.1.29</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk</artifactId>
      <version>1.11.985</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk-s3</artifactId>
      <version>1.11.985</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk-core</artifactId>
      <version>1.11.985</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-dynamodb -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk-dynamodb</artifactId>
      <version>1.11.985</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-cloudwatch -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk-cloudwatch</artifactId>
      <version>1.11.985</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-kinesis -->
    <dependency>
      <groupId>com.amazonaws</groupId>
      <artifactId>aws-java-sdk-kinesis</artifactId>
      <version>1.11.985</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-avro -->
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-avro_2.12</artifactId>
      <version>3.1.1</version>
    </dependency>

我想读取 avro 文件

 val conf = new SparkConf().setAppName("Nightmare").setMaster("local")
val sc = new SparkContext(conf)
sc.setLogLevel("Error")
val spark= SparkSession.builder().getOrCreate()
import spark.implicits._
//Step 1-2 read avro file
println("Step 1-2")
val df1 = spark.read
  .format("com.databricks.spark.avro")
  .option("multiline","true")
  .load("file:///D:/bigdata_tasks/nightmare.avro")
//step 3-4 Hit the url --- Convert it to dataframe -- https://randomuser.me/api/0.8/?results=1000 - df2
println("Step 3-5")
val html =Source.fromURL(" https://randomuser.me/api/0.8/?results=1000")
val rdddata=html.mkString
//convert string to rdd
val paralleldata=sc.parallelize(List(rdddata))
val df2= spark.read.json(paralleldata)
df2.printSchema()
df2.show()

运行后出现异常:

线程“main”中的异常 java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/StructFilters

我也尝试过以下代码:

      val df1 = spark.read
      .format("avro")
      .option("multiline","true")
      .load("file:///D:/bigdata_tasks/nightmare.avro")

但仍然有同样的例外。我的火花版本是 2.12。我应该更新 spark 版本吗?

【问题讨论】:

【参考方案1】:

这很可能是由于混淆了 Spark 版本 - 您的 Avro 库来自 Spark 3.1.1,而 Spark 的核心来自 3.0.1(通常最好将版本声明为属性,这样您就可以拥有一个版本对于所有组件)。另外,去掉不必要的依赖,比如spark-xml、aws sdk等。

还要检查 Scala 版本

【讨论】:

以上是关于intellij 中 spark scala 应用程序中的线程“main”java.lang.NoClassDefFoundError:org/apache/spark/sql/catalyst/St的主要内容,如果未能解决你的问题,请参考以下文章

Spark/Scala - 项目从 IntelliJ 运行良好,但 SBT 引发错误

如何在IntelliJ IDEA中运行Java/Scala/Spark程序

使用 IntelliJ idea 的 Scala 工作表作为 Apache Spark 的 Scala REPL

intellij idea 怎么编写python程序打包发送到spark

IntelliJ IDEA开发Spark的Maven项目Scala语言

Ubuntu环境下安装Scala以及安装IntelliJ Scala插件(Plugin)