Spark / Scala - scala.collection.convert.Wrappers$MutableSetWrapper - no valid constructor

Posted BIT_666

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark / Scala - scala.collection.convert.Wrappers$MutableSetWrapper - no valid constructor相关的知识,希望对你有一定的参考价值。

目录

一.引言

二.问题分析与定位

1.问题描述

2.代码回朔

2.1 asJava

2.2 Decorators

2.3 mutableSetAsJavaSetConverter

2.4 MutableSetWrapper

三.问题解决尝试

1.增加 constructor ❌

2.嵌套包装 Wrapper ❌

3.JavaConversions ❌

4.基础转换 java.util.Set 👍

四.总结


一.引言

Spark 项目下需要使用 Google Guava 的工具库,由于 Guava 工具库基于 Java 开发,因此 Scala 的 Collection 集合需要转换为 Java 版,使用 Scala mutable.HashSet[T] 转换 Java util.Set[T] 时报错 java.io.InvalidClassException: scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor,下面开始熟悉的踩坑环节。

二.问题分析与定位

1.问题描述

使用 Guava com.google.common.collect.Sets 库时,需要将 Sacla 的 Array 、Set、mutable.Set 或者 mutable.HashSet 均转化为 java.util.Set,于是我创建了 Object 静态类:

import scala.collection.JavaConverters._

object converUtil() 

  def converToUtilSet(array: Array[String]): java.util.Set[String] = 
    collection.mutable.Set.apply(array:_*).asJava // 转换为 util.Set
  

常规测试该方法可以正常生效,但是在 RDD 内调用就会报错:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2.0 (TID 3, localhost, executor driver): java.io.InvalidClassException: scala.collection.convert.Wrappers$MutableSetWrapper; no valid constructor
	at java.io.ObjectStreamClass$ExceptionInfo.newInvalidClassException(ObjectStreamClass.java:169)
	at java.io.ObjectStreamClass.checkDeserialize(ObjectStreamClass.java:874)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2043)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at scala.collection.mutable.HashMap$$anonfun$readObject$1.apply(HashMap.scala:143)
	at scala.collection.mutable.HashMap$$anonfun$readObject$1.apply(HashMap.scala:143)
	at scala.collection.mutable.HashTable$class.init(HashTable.scala:106)
	at scala.collection.mutable.HashMap.init(HashMap.scala:40)
	at scala.collection.mutable.HashMap.readObject(HashMap.scala:143)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at scala.collection.immutable.HashMap$SerializationProxy.readObject(HashMap.scala:582)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1170)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2178)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2287)
	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2211)
	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2069)
	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1573)
	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:431)
	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
	at org.apache.spark.scheduler.Task.run(Task.scala:123)
	at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:446)
	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:452)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

可以看到堆栈的日志,多为 deserialize 和 SerialData,再结合实际日志中,After MapPartition 日志并未打印,任务运行至 mapPartition 处后异常退出,猜测异常大致率与序列化相关: 

 

2.代码回朔

java.io.InvalidClassException: scala.collection.convert.Wrappers$MutableSetWrapper; 
no valid constructor

这里异常栈特别的长,报错的意思也很清晰,MutableSetWrapper 没有有效的构造方法,于是我们去找 asJava 调用的方法类 MutableSetWrapper。

2.1 asJava

asJava 方法继承 Decorators

private[collection] trait Decorators 
  /** Generic class containing the `asJava` converter method */
  class AsJava[A](op: => A) 
    /** Converts a Scala collection to the corresponding Java collection */
    def asJava: A = op
  

2.2 Decorators

Scala Set 转 Java Set 就是调用 Decorators 内的 mutableSetAsJavaSetConverter

  /**
   * Adds an `asJava` method that implicitly converts a Scala mutable `Set`>
   * to a Java `Set`.
   *
   * The returned Java `Set` is backed by the provided Scala `Set` and any
   * side-effects of using it via the Java interface will be visible via
   * the Scala interface and vice versa.
   *
   * If the Scala `Set` was previously obtained from an implicit or explicit
   * call of `asSet(java.util.Set)` then the original Java `Set` will be
   * returned.
   *
   * @param s The `Set` to be converted.
   * @return An object with an `asJava` method that returns a Java `Set` view
   *         of the argument.
   */
  implicit def mutableSetAsJavaSetConverter[A](s : mutable.Set[A]): AsJava[ju.Set[A]] =
    new AsJava(mutableSetAsJavaSet(s))

2.3 mutableSetAsJavaSetConverter

继续推进我们来到了 Trait WrapAsJava 接口,终于找到我们的主角 MutableSetWrapper

  /**
   * Implicitly converts a Scala mutable Set to a Java Set.
   * The returned Java Set is backed by the provided Scala
   * Set and any side-effects of using it via the Java interface will
   * be visible via the Scala interface and vice versa.
   *
   * If the Scala Set was previously obtained from an implicit or
   * explicit call of `asSet(java.util.Set)` then the original
   * Java Set will be returned.
   *
   * @param s The Set to be converted.
   * @return A Java Set view of the argument.
   */
  implicit def mutableSetAsJavaSet[A](s: mutable.Set[A]): ju.Set[A] = s match 
    case JSetWrapper(wrapped) => wrapped
    case _ => new MutableSetWrapper(s)
  

2.4 MutableSetWrapper

MutableSetWrapper 实现为 case class,没有显式的构造方法:

  case class MutableSeqWrapper[A](underlying: mutable.Seq[A]) extends ju.AbstractList[A] with IterableWrapperTrait[A] 
    def get(i: Int) = underlying(i)
    override def set(i: Int, elem: A) = 
      val p = underlying(i)
      underlying(i) = elem
      p
    
  

三.问题解决尝试

1.增加 constructor ❌

既然 no Valid constructor 构造函数,看网上很多大佬通过在父类增加空的构造函数并继承  java.io.Serializable 解决了该问题,但是本例下由于 MutableSetWrapper 类为 Scala 源码,因此我们无法在源码中增加改动,故放弃。

2.嵌套包装 Wrapper ❌

还有大佬通过嵌套包装类的形式,将 MutableSetWrapper 包装到自定义的继承序列化的类中:

import scala.collection.JavaConverters._

class MySerializableClass extends Serializable 

  // scala Set to Java Set Converters
  def scalaToJavaSetConverter(arr: Array[String]): java.util.Set[String] = 
    collection.mutable.Set.apply(arr:_*).asJava
  

结果与之前直接调用 object 静态类是相同的,提示 no valid constructor

3.JavaConversions ❌

除了 scala.collection.JavaConverters._ 外,scala.collection.JavaConversions 也支持转换 scala collection 为 java 类,所以我们更换 object 内的方法:

  def converToUtilSet(array: Array[String]): java.util.Set[String] = 

    val mutableSet = collection.mutable.Set.apply(array:_*)
    scala.collection.JavaConversions.setAsJavaSet(mutableSet)

  

哎,涛声依旧,还是不支持序列化: 

这里忽略一些中间过程,直接定位到最底层执行类,这里 SetWrapper 其实是 MutableSetWrapper 的父类,因此不论是 JavaConverters 还是 JavaConversions 应该问题相似。

  class SetWrapper[A](underlying: Set[A]) extends ju.AbstractSet[A] 
    self =>
    override def contains(o: Object): Boolean = 
      try  underlying.contains(o.asInstanceOf[A]) 
      catch  case cce: ClassCastException => false 
    
    override def isEmpty = underlying.isEmpty
    def size = underlying.size
    def iterator = new ju.Iterator[A] 
      val ui = underlying.iterator
      var prev: Option[A] = None
      def hasNext = ui.hasNext
      def next =  val e = ui.next(); prev = Some(e); e 
      def remove = prev match 
        case Some(e) =>
          underlying match 
            case ms: mutable.Set[a] =>
              ms remove e
              prev = None
            case _ =>
              throw new UnsupportedOperationException("remove")
          
        case _ =>
          throw new IllegalStateException("next must be called at least once before remove")
      
    
  

 

4.基础转换 java.util.Set 👍

上面方法试了个遍,都不行,看来只能用最原始的办法了,那就是自己初始化一个 java.util.Set,然后把元素都一个一个 add 进去:

  def converToUtilSet(array: Array[String]): java.util.Set[String] = 

    val javaSet = new java.util.HashSet[String]()
    array.foreach(javaSet.add)
    javaSet

  

虽然相比前几种方法显得不够优雅,但是它能解决实际问题,因此也足够优雅!

四.总结

为什么要使用 asJava 或者 setAsJavaSet 方法而不是用最基础的 new + add 呢,我们简单测试下,这里构造 4 种 Scala Array 转 Java Set 的方法各运行 1000 次,看看耗时如何:

  def convertSet(arr: Array[String], format: String): Unit = 
    val st = System.currentTimeMillis()
    var epoch = 0

    while (epoch < 1000) 
      if (format.equals("Conversion")) 
        val mutableSet = collection.mutable.Set.apply(arr:_*)
        scala.collection.JavaConversions.setAsJavaSet(mutableSet)
       else if (format.equals("Converter")) 
        val mutableSet = collection.mutable.Set.apply(arr:_*)
        JavaConverters.mutableSetAsJavaSetConverter(mutableSet).asJava
       else if (format.equals("AsJava")) 
        collection.mutable.Set.apply(arr:_*).asJava
       else 
        val javaSet = new java.util.HashSet[String]()
        arr.foreach(javaSet.add)
      
      epoch += 1
    

    val end = System.currentTimeMillis()
    println(s"Epoch: $epoch Format: $format Cost: $end - st")
  

分别采用长度为 5 和 500 的 Array[String] 测试:

Cost / msShort ArrayLong Array
Conversion58162
Converter1978
AsJava156
Common875

经过测试 AsJava 速度最快,不过这里是语法问题,AsJava 本质也是调用了 JavaConverters。总的来说问题是解决了,但是为什么没有构造方法的 case class 不能序列化呢,还是很神奇。

以上是关于Spark / Scala - scala.collection.convert.Wrappers$MutableSetWrapper - no valid constructor的主要内容,如果未能解决你的问题,请参考以下文章

Spark scala 模拟 spark.implicits 用于单元测试

Spark 中用 Scala 和 java 开发有啥区别

在 Bash 脚本中执行 Apache Spark (Scala) 代码

Spark机器学习速成宝典基础篇01Windows下spark开发环境搭建+sbt+idea(Scala版)

spark-scala-java实现wordcount

如何在idea中用maven配置spark和scala