Spark SQL toDF 方法因 java.lang.NoSuchMethodError 而失败

Posted

技术标签:

【中文标题】Spark SQL toDF 方法因 java.lang.NoSuchMethodError 而失败【英文标题】:Spark SQL toDF method fails with java.lang.NoSuchMethodError 【发布时间】:2017-01-21 20:53:53 【问题描述】:

目标

了解问题的原因和解决方案。使用 spark-submit 时会出现问题。感谢您的帮助。

spark-submit --class "AuctionDataFrame" --master spark://<hostname>:7077 auction-project_2.11-1.0.jar

在spark-shell中逐行运行不会报错。

...
scala>     val auctionsDF = auctionsRDD.toDF()
auctionsDF: org.apache.spark.sql.DataFrame = [aucid: string, bid: float, bidtime: float, bidder: string, bidrate: int, openbid: float, price: float, itemtype: string, dtl: int]
scala> auctionsDF.printSchema()
root
 |-- aucid: string (nullable = true)
 |-- bid: float (nullable = false)
 |-- bidtime: float (nullable = false)
 |-- bidder: string (nullable = true)
 |-- bidrate: integer (nullable = false)
 |-- openbid: float (nullable = false)
 |-- price: float (nullable = false)
 |-- itemtype: string (nullable = true)
 |-- dtl: integer (nullable = false)

问题

调用toDF方法将RDD转为DataFrame会报错。

Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror(Ljava/lang/ClassLoader;)Lscala/reflect/api/JavaMirrors$JavaMirror;
    at AuctionDataFrame$.main(AuctionDataFrame.scala:52)
    at AuctionDataFrame.main(AuctionDataFrame.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

代码

case class Auctions(
  aucid: String,
  bid: Float,
  bidtime: Float,
  bidder: String,
  bidrate: Int,
  openbid: Float,
  price: Float,
  itemtype: String,
  dtl: Int)

object AuctionDataFrame 
  val AUCID = 0
  val BID = 1
  val BIDTIME = 2
  val BIDDER = 3
  val BIDRATE = 4
  val OPENBID = 5
  val PRICE = 6
  val ITEMTYPE = 7
  val DTL = 8

  def main(args: Array[String]) 
    val conf = new SparkConf().setAppName("AuctionDataFrame")
    val sc = new SparkContext(conf)
    val sqlContext = new org.apache.spark.sql.SQLContext(sc)
    import sqlContext.implicits._

    val inputRDD = sc.textFile("/user/wynadmin/auctiondata.csv").map(_.split(","))
    val auctionsRDD = inputRDD.map(a =>
      Auctions(
        a(AUCID),
        a(BID).toFloat,
        a(BIDTIME).toFloat,
        a(BIDDER),
        a(BIDRATE).toInt,
        a(OPENBID).toFloat,
        a(PRICE).toFloat,
        a(ITEMTYPE),
        a(DTL).toInt))
    val auctionsDF = auctionsRDD.toDF()  // <--- line 52 causing the error.

build.sbt

name := "Auction Project"

version := "1.0"

scalaVersion := "2.11.8"
//scalaVersion := "2.10.6"

/* 
libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2",
    "org.apache.spark" %% "spark-sql" % "1.6.2",
    "org.apache.spark" %% "spark-mllib" % "1.6.2"
)
*/

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)

环境

Ubuntu 14.04 上的 Spark:

      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 1.6.2
      /_/

Using Scala version 2.11.7 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_92)

Windows 上的 sbt:

D:\>sbt sbtVersion
[info] Set current project to root (in build file:/D:/)
[info] 0.13.12

研究

调查了表明编译 Spark 的 Scala 版本不兼容的类似问题。

Spark 1.4 RDD to DF fails with toDF() Spark MLlib example, NoSuchMethodError:

因此将 build.sbt 中的 Scala 版本更改为 2.10,它创建了 2.10 jar,但错误仍然存​​在。使用 % 提供与否不会改变错误。

scalaVersion := "2.10.6"

【问题讨论】:

Exception in thread "main" java.lang.NoSuchMethodError: scala.reflect.api.JavaUniverse.runtimeMirror的可能重复 看起来仍然是版本问题。仔细检查到处都在使用什么版本... @TzachZohar,感谢您的评论,但我将 Scala 版本更改为“2.10.6”,再次运行“sbt clean”和“sbt package”,但没有解决问题。您能否更具体地说明链接的文章如何解决? 【参考方案1】:

原因

Spark 1.6.2 是使用 Scala 2.11 从源文件编译而来的。然而 spark-1.6.2-bin-without-hadoop.tgz 被下载并放置在 lib/ 目录中。

我相信因为 spark-1.6.2-bin-without-hadoop.tgz 是用 Scala 2.10 编译的,所以会导致兼容性问题。

修复

从 lib 目录中删除 spark-1.6.2-bin-without-hadoop.tgz 并运行带有以下库依赖项的“sbt package”。

libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-core" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-sql" % "1.6.2" % "provided",
    "org.apache.spark" %% "spark-mllib" % "1.6.2" % "provided"
)

【讨论】:

以上是关于Spark SQL toDF 方法因 java.lang.NoSuchMethodError 而失败的主要内容,如果未能解决你的问题,请参考以下文章

toDF() 不处理 RDD

Spark学习记录:Spark SQL编程

spark sql Dataframe 的 unionreducereduce(_ union _)

spark sql Dataframe 的 unionreducereduce(_ union _)

spark sql

值 toDF 不是成员 org.apache.spark.rdd.RDD