Zeppelin spark RDD 命令失败但在 spark-shell 中工作
Posted
技术标签:
【中文标题】Zeppelin spark RDD 命令失败但在 spark-shell 中工作【英文标题】:Zeppelin spark RDD commands fail yet work in spark-shell 【发布时间】:2016-01-13 11:58:33 【问题描述】:我已经设置了一个独立的单节点“集群”,运行以下内容:
卡桑德拉 2.2.2 Spark 1.5.1 列表项 为 Spark-Cassandra-Connector 1.5.0-M2 编译的 fat jar 编译的 Zeppelin 0.6 快照 编译: mvn -Pspark-1.5 -Dspark.version=1.5.1 -Dhadoop.version=2.6.0 -Phadoop-2.4 -DskipTests 清洁包使用 spark shell 从 cassandra 检索数据我可以完美地工作
我已将 Zeppelin-env.sh 更改如下:
export MASTER=spark://localhost:7077
export SPARK_HOME=/root/spark-1.5.1-bin-hadoop2.6/
export ZEPPELIN_PORT=8880
export ZEPPELIN_JAVA_OPTS="-Dspark.jars=/opt/sparkconnector/spark-cassandra-connector-assembly-1.5.0-M2-SNAPSHOT.jar -Dspark.cassandra.connection.host=localhost"
export ZEPPELIN_NOTEBOOK_DIR="/root/gowalla-spark-demo/notebooks/zeppelin"
export SPARK_SUBMIT_OPTIONS="--jars /opt/sparkconnector/spark-cassandra-connector-assembly-1.5.0-M2-SNAPSHOT.jar --deploy-mode cluster"
export ZEPPELIN_INTP_JAVA_OPTS=$ZEPPELIN_JAVA_OPTS
然后我开始将段落添加到笔记本并首先导入以下内容:
import com.datastax.spark.connector._
import com.datastax.spark.connector.cql._
import com.datastax.spark.connector.rdd.CassandraRDD
import org.apache.spark.rdd.RDD
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
不确定是否所有这些都是必要的。这一段运行良好。
然后我执行以下操作:
val checkins = sc.cassandraTable("lbsn", "checkins")
这运行良好并返回:
checkins: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
然后下一段 - 运行以下 2 个语句 - 第一个成功,第二个失败:
checkins.count
checkins.first
结果:
res13: Long = 138449
com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: "id":"4","name":"first"; line: 1, column: 1]
at com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at com.fasterxml.jackson.databind.DeserializationContext.mappingException(DeserializationContext.java:843)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.addBeanProps(BeanDeserializerFactory.java:533)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.buildBeanDeserializer(BeanDeserializerFactory.java:220)
at com.fasterxml.jackson.databind.deser.BeanDeserializerFactory.createBeanDeserializer(BeanDeserializerFactory.java:143)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:358)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:265)
at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:245)
at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:143)
at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:439)
at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:3666)
at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3558)
at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2578)
at org.apache.spark.rdd.RDDOperationScope$.fromJson(RDDOperationScope.scala:82)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1582)
at org.apache.spark.rdd.RDD$$anonfun$34.apply(RDD.scala:1582)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.RDD.<init>(RDD.scala:1582)
at com.datastax.spark.connector.rdd.CassandraRDD.<init>(CassandraRDD.scala:15)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.<init>(CassandraTableScanRDD.scala:59)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.copy(CassandraTableScanRDD.scala:92)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.copy(CassandraTableScanRDD.scala:59)
at com.datastax.spark.connector.rdd.CassandraRDD.limit(CassandraRDD.scala:103)
at com.datastax.spark.connector.rdd.CassandraRDD.take(CassandraRDD.scala:122)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1312)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.first(RDD.scala:1311)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:41)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:43)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:45)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:47)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:49)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:51)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:53)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:55)
at $iwC$$iwC$$iwC.<init>(<console>:57)
at $iwC$$iwC.<init>(<console>:59)
at $iwC.<init>(<console>:61)
at <init>(<console>:63)
at .<init>(<console>:67)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1340)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.zeppelin.spark.SparkInterpreter.interpretInput(SparkInterpreter.java:655)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:620)
at org.apache.zeppelin.spark.SparkInterpreter.interpret(SparkInterpreter.java:613)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.interpret(ClassloaderInterpreter.java:57)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:93)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:276)
at org.apache.zeppelin.scheduler.Job.run(Job.java:170)
at org.apache.zeppelin.scheduler.FIFOScheduler$1.run(FIFOScheduler.java:118)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
为什么调用 first 失败。 sc.fromTextFile 等调用也会失败。
以下也适用:
checkins.where("year = 2010 and month=2 and day>12 and day<15").count()
但事实并非如此:
checkins.where("year = 2010 and month=2 and day>12 and day<15").first()
请帮忙,因为这让我发疯了。特别是因为火花壳可以工作,但它没有或至少看起来部分损坏。
谢谢
【问题讨论】:
【参考方案1】:com.fasterxml.jackson.databind.JsonMappingException: Could not find creator property with name 'id' (in class org.apache.spark.rdd.RDDOperationScope)
at [Source: "id":"4","name":"first"; line: 1, column: 1]
当类路径中有两个或多个版本的杰克逊库时会发生此异常。
确保您的 Spark 解释器进程在类路径中只有一个版本的 jackson 库。
【讨论】:
以下是包含 aws-java-sdk 时发生此问题的示例。这种情况下的解决方案将是类似的。 ***.com/questions/45511804/…以上是关于Zeppelin spark RDD 命令失败但在 spark-shell 中工作的主要内容,如果未能解决你的问题,请参考以下文章
Zeppelin:构建 zeppelin 时 Web 应用程序失败
在 zeppelin 中将 pandas 数据帧转换为 spark 数据帧
Spark SQL 'explode' 命令在 AWS EC2 上失败,但在本地成功