org.apache.spark.SparkException:任务不可序列化 java

Posted

技术标签:

【中文标题】org.apache.spark.SparkException:任务不可序列化 java【英文标题】:org.apache.spark.SparkException: Task not serializable java 【发布时间】:2016-09-20 07:29:23 【问题描述】:

我正在尝试通过 foreachpartition 将结果添加到 mysql,但收到错误 org.apache.spark.SparkException: Task not serializable java。

公共类 Insert 实现 Serializable

 transient static JavaSparkContext spc;
public static void main(String gg[]) 


 Map<String, String> options = new HashMap<String, String>();
        options.put("url","jdbc:mysql://localhost:3306/testing?user=root&password=pwd");
        options.put("dbtable", "rtl");
 SparkConf ss=new SparkConf().setAppName("insert").setMaster("local");

 spc=new JavaSparkContext(ss);

    JavaRDD<String> rbm=spc.textFile(path);
    // DataFrame jdbcDF = sqlContext.jdbc(options.get("url"),options.get("dbtable"));

    // System.out.println("Data------------------->" + jdbcDF.toJSON().first());


 JavaRDD<String> file=rbm.flatMap(new FlatMapFunction<String, String>() 
NotSerializableException nn=new NotSerializableException();
    public Iterable<String> call(String x)  
        // TODO Auto-generated method stub

        return Arrays.asList(x.split("  ")[0]);
    
);



try 
    file.foreachPartition(new VoidFunction<Iterator<String>>()   
    Connection conn= (Connection) DriverManager.getConnection("jdbc:mysql://localhost/testing","root","amd@123");

        PreparedStatement del = (PreparedStatement) conn.prepareStatement ("INSERT INTO rtl (rtl_s) VALUES (?) ");
        NotSerializableException nn=new NotSerializableException();
            public void call(Iterator<String> x) throws Exception 
                // TODO Auto-generated method stub
    while(x.hasNext())
    
                String y=x.toString();
                del.setString(1, y);
                del.executeUpdate();
    
            

    );
 catch (Exception e) 
    // TODO Auto-generated catch block
    e.printStackTrace();


我遇到了错误

6/09/20 12:37:58 INFO SparkContext: Created broadcast 0 from textFile at Insert.java:41
org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:919)
    at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
    at org.apache.spark.api.java.JavaRDDLike$class.foreachPartition(JavaRDDLike.scala:225)
    at org.apache.spark.api.java.AbstractJavaRDDLike.foreachPartition(JavaRDDLike.scala:46)
    at final_file.Insert.main(Insert.java:59)
Caused by: java.io.NotSerializableException: java.lang.Object
Serialization stack:
    - object not serializable (class: java.lang.Object, value: java.lang.Object@4395342)
    - writeObject data (class: java.util.HashMap)
    - object (class java.util.HashMap, UTF-8=java.lang.Object@4395342, WINDOWS-1252=com.mysql.jdbc.SingleByteCharsetConverter@72ffabab, US-ASCII=com.mysql.jdbc.SingleByteCharsetConverter@6f5fa288)
    - field (class: com.mysql.jdbc.ConnectionImpl, name: charsetConverterMap, type: interface java.util.Map)
    - object (class com.mysql.jdbc.JDBC4Connection, com.mysql.jdbc.JDBC4Connection@6761e52a)
    - field (class: final_file.Insert$2, name: conn, type: interface com.mysql.jdbc.Connection)
    - object (class final_file.Insert$2, final_file.Insert$2@45436e66)
    - field (class: org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1, name: f$12, type: interface org.apache.spark.api.java.function.VoidFunction)
    - object (class org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
    ... 12 more

我在尝试将结果更新到 mysql 时遇到上述错误。

【问题讨论】:

DriverManager 包含什么?好像不能序列化了。 其实里面包含了mysql的属性。它有用户名和密码以及数据库名称。 【参考方案1】:

当你使用 spark 的一些 action 方法(如 map、flapMap...)时,spark 会尝试序列化你使用的所有函数、方法和字段。

但是方法和字段不能被序列化,所以整个类的方法或字段来自的都会被序列化。

如果这些类没有实现 java.io.seializable ,则会发生此异常。 你可以通过搜索序列化路径找到NotSerializableException在哪里遇到。

在你的情况下,你可以看这里:

Caused by: java.io.NotSerializableException: java.lang.Object
Serialization stack:
    - object not serializable (class: java.lang.Object, value: java.lang.Object@4395342)

【讨论】:

以上是关于org.apache.spark.SparkException:任务不可序列化 java的主要内容,如果未能解决你的问题,请参考以下文章