Spark SQL toDF 方法因 java.lang.NoSuchMethodError 而失败
Posted
技术标签:
【中文标题】Spark SQL toDF 方法因 java.lang.NoSuchMethodError 而失败【英文标题】:Spark SQL toDF method fails with java.lang.NoSuchMethodError 【发布时间】:2017-01-21 20:53:53 【问题描述】:目标
了解问题的原因和解决方案。使用 spark-submit 时会出现问题。感谢您的帮助。
spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar
在spark-shell中逐行运行不会报错。
...
scala> val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
|-- aucid: string (nullable = true)
|-- bid: float (nullable = false)
|-- bidtime: float (nullable = false)
|-- bidder: string (nullable = true)
|-- bidrate: integer (nullable = false)
|-- openbid: float (nullable = false)
|-- price: float (nullable = false)
|-- itemtype: string (nullable = true)
|-- dtl: integer (nullable = false)
问题
调用toDF方法将RDD转为DataFrame会报错。
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
at AuctionDataFrame.main(AuctionDataFrame.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
代码
case class Auctions(
aucid: String,
bid: Float,
bidtime: Float,
bidder: String,
bidrate: Int,
openbid: Float,
price: Float,
itemtype: String,
dtl: Int)
object AuctionDataFrame
val AUCID = 0
val BID = 1
val BIDTIME = 2
val BIDDER = 3
val BIDRATE = 4
val OPENBID = 5
val PRICE = 6
val ITEMTYPE = 7
val DTL = 8
def main(args: Array[String])
val conf = new SparkConf().setAppName("AuctionDataFrame")
val sc = new SparkContext(conf)
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
val auctionsRDD = inputRDD.map(a =>
Auctions(
a(AUCID),
a(BID).toFloat,
a(BIDTIME).toFloat,
a(BIDDER),
a(BIDRATE).toInt,
a(OPENBID).toFloat,
a(PRICE).toFloat,
a(ITEMTYPE),
a(DTL).toInt))
val auctionsDF = auctionsRDD.toDF() // <--- line 52 causing the error.
build.sbt
name := "Auction Project"
version := "1.0"
scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"
/*
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2",
"org.apache.spark" %% "spark-sql" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
环境
Ubuntu 14.04 上的 Spark:
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.2
/_/
Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)
Windows 上的 sbt:
D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12
研究
调查了表明编译 Spark 的 Scala 版本不兼容的类似问题。
Spark 1.4 RDD to DF fails with toDF() Spark MLlib example, NoSuchMethodError:因此将 build.sbt 中的 Scala 版本更改为 2.10,它创建了 2.10 jar,但错误仍然存在。使用 % 提供与否不会改变错误。
scalaVersion := "2.10.6"
【问题讨论】:
Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror的可能重复 看起来仍然是版本问题。仔细检查到处都在使用什么版本... @TzachZohar,感谢您的评论,但我将 Scala 版本更改为“2.10.6”,再次运行“sbt clean”和“sbt package”,但没有解决问题。您能否更具体地说明链接的文章如何解决? 【参考方案1】:原因
Spark 1.6.2 是使用 Scala 2.11 从源文件编译而来的。然而 spark-1.6.2-bin-without-hadoop.tgz 被下载并放置在 lib/ 目录中。
我相信因为 spark-1.6.2-bin-without-hadoop.tgz 是用 Scala 2.10 编译的,所以会导致兼容性问题。
修复
从 lib 目录中删除 spark-1.6.2-bin-without-hadoop.tgz 并运行带有以下库依赖项的“sbt package”。
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
"org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)
【讨论】:
以上是关于Spark SQL toDF 方法因 java.lang.NoSuchMethodError 而失败的主要内容,如果未能解决你的问题,请参考以下文章
spark sql Dataframe 的 unionreducereduce(_ union _)