我无法使用 jdbc 将 Spark DataFrame 写入数据库

Posted

技术标签:

【中文标题】我无法使用 jdbc 将 Spark DataFrame 写入数据库【英文标题】:I can't write a Spark DataFrame to database with jdbc 【发布时间】:2017-07-12 13:23:29 【问题描述】:

我正在尝试将一个简单的数据帧写入 oracle 数据库,但我收到一条错误消息。我使用案例类和列表来构建我的数据框。我发现我们可以在写入后使用 jdbc 方法将数据插入我的 oracle 数据库。 我试过这段代码:

case class MyClass(A: String, B: Int)
val MyClass_List = List(MyClass("att1", 1), MyClass("att2", 2))

val MyClass_df = MyClass_List.toDF()

MyClass_df.write
            .mode("append")
            .jdbc(url, tableTest, prop)

但我收到以下错误:

17/07/12 14:57:04 ERROR JobScheduler: Error running job streaming job 1499864218000 ms.0
java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
        at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
        at Test$$anonfun$1.apply(Test.scala:177)
        at Test$$anonfun$1.apply(Test.scala:117)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
        at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
        at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
        at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
        at Test$$anonfun$1.apply(Test.scala:177)
        at Test$$anonfun$1.apply(Test.scala:117)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
        at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
        at scala.util.Try$.apply(Try.scala:192)
        at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
        at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
        at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:748)

我使用 spark 版本 2.1.0 和我的数据库作为两列 A 和 B,分别键入 varchar 和 number。

你有什么想法吗?

【问题讨论】:

【参考方案1】:

其实我用的是mysql的驱动,尽管有oracle的驱动。 我应该使用

prop.setProperty("driver", "oracle.jdbc.driver.OracleDriver")

而不是

prop.setProperty("driver", "com.mysql.jdbc.Driver")

【讨论】:

【参考方案2】:

它应该是“oracle.jdbc.OracleDriver”,因为驱动程序包中的那个已被弃用。

prop.setProperty("driver", "oracle.jdbc.OracleDriver")

【讨论】:

@a.moussa 。你能检查一下并提供解决方案吗***.com/questions/56151363/…

以上是关于我无法使用 jdbc 将 Spark DataFrame 写入数据库的主要内容,如果未能解决你的问题,请参考以下文章

使用 java Spark DataFrame 通过 jdbc 访问 Oracle

无法使用 JDBC 连接到 Spark thriftserver

Spark 无法从 SBT 找到 JDBC 驱动程序

在 Spark SQL 中使用 Presto JDBC 时无法识别的连接属性“url”

无法使用 jdbc 和 spark 连接器从 databricks 集群连接到 Azure 数据库 for MySQL 服务器

无法在 ipython 中正确创建 spark 上下文以链接到 MySQL - com.mysql.jdbc.Driver