intellij 中 spark scala 应用程序中的线程“main”java.lang.NoClassDefFoundError:org/apache/spark/sql/catalyst/St
Posted
技术标签:
【中文标题】intellij 中 spark scala 应用程序中的线程“main”java.lang.NoClassDefFoundError:org/apache/spark/sql/catalyst/StructFilters 中的异常【英文标题】:Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/StructFilters in spark scala application in intellij 【发布时间】:2021-06-24 13:30:51 【问题描述】:我的 Pom.xml
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>$scala.version</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.specs</groupId>
<artifactId>specs</artifactId>
<version>1.2.5</version>
<scope>test</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.12</artifactId>
<version>3.0.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.databricks/spark-xml -->
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.12</artifactId>
<version>0.10.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.29</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-s3 -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-core -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-core</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-dynamodb -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-dynamodb</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-cloudwatch -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-cloudwatch</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-kinesis -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-kinesis</artifactId>
<version>1.11.985</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-avro -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-avro_2.12</artifactId>
<version>3.1.1</version>
</dependency>
我想读取 avro 文件
val conf = new SparkConf().setAppName("Nightmare").setMaster("local")
val sc = new SparkContext(conf)
sc.setLogLevel("Error")
val spark= SparkSession.builder().getOrCreate()
import spark.implicits._
//Step 1-2 read avro file
println("Step 1-2")
val df1 = spark.read
.format("com.databricks.spark.avro")
.option("multiline","true")
.load("file:///D:/bigdata_tasks/nightmare.avro")
//step 3-4 Hit the url --- Convert it to dataframe -- https://randomuser.me/api/0.8/?results=1000 - df2
println("Step 3-5")
val html =Source.fromURL(" https://randomuser.me/api/0.8/?results=1000")
val rdddata=html.mkString
//convert string to rdd
val paralleldata=sc.parallelize(List(rdddata))
val df2= spark.read.json(paralleldata)
df2.printSchema()
df2.show()
运行后出现异常:
线程“main”中的异常 java.lang.NoClassDefFoundError: org/apache/spark/sql/catalyst/StructFilters
我也尝试过以下代码:
val df1 = spark.read
.format("avro")
.option("multiline","true")
.load("file:///D:/bigdata_tasks/nightmare.avro")
但仍然有同样的例外。我的火花版本是 2.12。我应该更新 spark 版本吗?
【问题讨论】:
【参考方案1】:这很可能是由于混淆了 Spark 版本 - 您的 Avro 库来自 Spark 3.1.1,而 Spark 的核心来自 3.0.1(通常最好将版本声明为属性,这样您就可以拥有一个版本对于所有组件)。另外,去掉不必要的依赖,比如spark-xml、aws sdk等。
还要检查 Scala 版本
【讨论】:
以上是关于intellij 中 spark scala 应用程序中的线程“main”java.lang.NoClassDefFoundError:org/apache/spark/sql/catalyst/St的主要内容,如果未能解决你的问题,请参考以下文章
Spark/Scala - 项目从 IntelliJ 运行良好,但 SBT 引发错误
如何在IntelliJ IDEA中运行Java/Scala/Spark程序
使用 IntelliJ idea 的 Scala 工作表作为 Apache Spark 的 Scala REPL
intellij idea 怎么编写python程序打包发送到spark