Spark/Scala - 项目从 IntelliJ 运行良好,但 SBT 引发错误
Posted
技术标签:
【中文标题】Spark/Scala - 项目从 IntelliJ 运行良好,但 SBT 引发错误【英文标题】:Spark/Scala - Project runs fine from IntelliJ but throws error with SBT 【发布时间】:2018-05-27 12:50:05 【问题描述】:我有一个 Spark 项目,我在 IntelliJ 中本地运行,从那里运行时运行良好。该项目非常简单,目前只是一个玩具示例。下面是代码:
package mls.main
import org.apache.spark.SparkContext._
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.DataFrame, SQLContext
import org.apache.spark.SparkConf, SparkContext
import java.nio.file.Paths, Files
import scala.io.Source
object Main
def main(args: Array[String])
import org.apache.log4j.Logger
import org.apache.log4j.Level
print("HELLO WORLD!")
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
// fire up spark
val sc = createContext
val sqlContext = new SQLContext(sc)
loadAHSData(List("x"),sqlContext)
def loadAHSData(years: List[String], sqlContext : SQLContext) : Unit =
// load the column names that exists in all 3 datasets
val columns = sqlContext.sparkContext
.textFile("data/common_columns.txt")
.collect()
.toSeq
columns.foreach(println)
def createContext(appName: String, masterUrl: String): SparkContext =
val conf = new SparkConf().setAppName(appName).setMaster(masterUrl)
new SparkContext(conf)
def createContext(appName: String): SparkContext = createContext(appName, "local")
def createContext: SparkContext = createContext("Data Application", "local")
当我通过 IntelliJ 运行时,我会从指定的文本文件中获得包含几列的正确输出。但是,当cd
进入正确的目录然后运行sbt run
我看到“HELLO WORLD!”输出,但随后失败并显示以下堆栈跟踪:
java.lang.ClassNotFoundException: scala.None$
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1826)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:309)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17/12/13 09:52:14 WARN FileSystem: exception in the cleaner thread but it will continue to run
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
at org.apache.hadoop.fs.FileSystem$Statistics$StatisticsDataReferenceCleaner.run(FileSystem.java:2989)
at java.lang.Thread.run(Thread.java:748)
17/12/13 09:52:14 ERROR Utils: uncaught error in thread SparkListenerBus, stopping SparkContext
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:80)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
17/12/13 09:52:14 ERROR Utils: throw uncaught fatal error in thread SparkListenerBus
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
at java.util.concurrent.Semaphore.acquire(Semaphore.java:312)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:80)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
17/12/13 09:52:14 ERROR ContextCleaner: Error in cleaning thread
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
at org.apache.spark.ContextCleaner$$anonfun$org$apache$spark$ContextCleaner$$keepCleaning$1.apply$mcV$sp(ContextCleaner.scala:181)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1279)
at org.apache.spark.ContextCleaner.org$apache$spark$ContextCleaner$$keepCleaning(ContextCleaner.scala:178)
at org.apache.spark.ContextCleaner$$anon$1.run(ContextCleaner.scala:73)
而我的 build.sbt 看起来像:
name := "MLS_scala"
version := "0.1"
scalaVersion := "2.11.1"
resolvers ++= Seq(
Resolver.sonatypeRepo("releases"),
Resolver.sonatypeRepo("snapshots")
)
val sparkVersion = "2.2.0"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion,
"org.apache.spark" %% "spark-streaming" % sparkVersion,
"org.apache.spark" %% "spark-hive" % sparkVersion
)
我无法弄清楚为什么它在 IntelliJ 中完美运行,但从 sbt 中得到了该错误。请让我知道我是否可以采取任何步骤来解决此问题。谢谢!
【问题讨论】:
sbt 的版本是多少?你有project/build.properties
文件吗?里面有什么?
【参考方案1】:
可能与 scala lib 版本控制有关,请尝试将其添加到您的 build.sbt:
fork := true
或者,仅适用于普通运行:
fork in run := true
【讨论】:
这行得通,但有人可以解释一下吗? @Thusitha 这意味着 sbt 在单独的 JVM 中运行您的应用程序,而不是在运行 sbt 本身的 JVM 中,这是默认行为。使用像 Spark 这样的框架将 JVM 调整推到极限,很容易看出这是如何避免一些麻烦的。如果我不够准确,请参见例如here.【参考方案2】:在运行之前必须先使用SBT编译项目,使用sbt compile
并确保在sbt run
之前使用SBT成功构建项目,指导链接https://alvinalexander.com/scala/sbt-how-to-compile-run-package-scala-project
【讨论】:
以上是关于Spark/Scala - 项目从 IntelliJ 运行良好,但 SBT 引发错误的主要内容,如果未能解决你的问题,请参考以下文章
Spark:使用 Spark Scala 从 Kafka 读取 Avro 消息
无法使用 Spark/Scala 从 JSON 嵌套键值对创建列和值
使用 spark/scala 从 HDFS 目录中获取所有 csv 文件名
[Spark/Scala] 180414|大数据实战培训 Spark大型项目实战:电商用户行为分析大数据平台 大数据视频教程