带有 Cassandra 的 Apache Spark 1.5:类转换异常

Posted

技术标签:

【中文标题】带有 Cassandra 的 Apache Spark 1.5:类转换异常【英文标题】:Apache Spark 1.5 with Cassandra : Class cast exception 【发布时间】:2015-09-16 10:55:16 【问题描述】:

我使用以下软件:

    卡桑德拉 2.1.9 Spark 1.5 Java 使用 Datastax 提供的 Cassandra 驱动程序。 Ubuntu 12.0.4

当我使用 local[8] 在本地运行 spark 时,程序运行良好,数据保存到 Cassandra。但是,当我将作业提交到 spark 集群时,会引发以下异常:

16 Sep 2015 03:08:58,808  WARN [task-result-getter-0] (Logging.scala:71) TaskSetManager - Lost task 3.0 in stage 0.0 (TID 3,
192.168.50.131): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.HashMap$SerializationProxy to field scala.collection.Map$WithDefault.underlying of type scala.collection.Map in instance of scala.collection.immutable.Map$WithDefault
        at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2083)
        at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1261)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1996)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:72)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:98)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:88)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)

我不知道如何解决这个错误。我只使用以下 2 个依赖项:

    spark-assembly-1.5.0-hadoop2.6.0.jar --> 自带 Spark 下载 spark-cassandra-connector-java-assembly-1.5.0-M1-SNAPSHOT.jar --> 使用 sbt 从 Git 构建。

我也已将捆绑的应用程序 jar 导出到 spark 类路径中。 请提供帮助,因为我不确定这是特定于应用程序的错误还是 Spark 分发本身的问题。

【问题讨论】:

【参考方案1】:

我终于发现了问题。

问题是我只是将捆绑的应用程序 jar(胖 jar)添加到 spark 上下文中,并排除了以下两个 jar:

1. spark-assembly-1.5.0-hadoop2.6.0.jar

2。 spark-cassandra-connector-java-assembly-1.5.0-M1-SNAPSHOT.jar.

事实证明,我还应该将 spark-cassandra-connector-java-assembly-1.5.0-M1-SNAPSHOT.jar 添加到 spark 上下文中,并且只排除 spark- assembly-1.5.0-hadoop2.6.0.jar.

【讨论】:

我遇到了与您相同的问题,但我无法理解您的解决方案。我将 sparkconf 定义如下: SparkConf sparkConf = new SparkConf().setAppName(new String("New app")); sparkConf.setMaster("spark://xxx:7077"); sparkConf.setAppName("Streambase 后端 API"); sparkConf.set("spark.cassandra.connection.host", "127.0.0.1"); JavaSparkContext sparkContext = new JavaSparkContext(sparkConf);我应该添加这样的东西吗? String[] jars = "/opt/spark-1.6.1-bin-hadoop2.6/spark-assembly-1.6.1-hadoop2.6.0.jar"; sparkConf.setJars(jars);

以上是关于带有 Cassandra 的 Apache Spark 1.5:类转换异常的主要内容,如果未能解决你的问题,请参考以下文章

Apache Cassandra vs Datastax Cassandra [关闭]

Oracle 到 Apache Cassandra 数据迁移

架构更改后 Cassandra 无法启动

Datsac Cassandra 与 Apache Cassandra 绑定

CentOS 7.4 安装 Apache Cassandra 3.7

如何提供存储在 cassandra 数据库中的图像? [关闭]