我无法使用 jdbc 将 Spark DataFrame 写入数据库
Posted
技术标签:
【中文标题】我无法使用 jdbc 将 Spark DataFrame 写入数据库【英文标题】:I can't write a Spark DataFrame to database with jdbc 【发布时间】:2017-07-12 13:23:29 【问题描述】:我正在尝试将一个简单的数据帧写入 oracle 数据库,但我收到一条错误消息。我使用案例类和列表来构建我的数据框。我发现我们可以在写入后使用 jdbc 方法将数据插入我的 oracle 数据库。 我试过这段代码:
case class MyClass(A: String, B: Int)
val MyClass_List = List(MyClass("att1", 1), MyClass("att2", 2))
val MyClass_df = MyClass_List.toDF()
MyClass_df.write
.mode("append")
.jdbc(url, tableTest, prop)
但我收到以下错误:
17/07/12 14:57:04 ERROR JobScheduler: Error running job streaming job 1499864218000 ms.0
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
at Test$$anonfun$1.apply(Test.scala:177)
at Test$$anonfun$1.apply(Test.scala:117)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Exception in thread "main" java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:93)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:426)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at org.apache.spark.sql.DataFrameWriter.jdbc(DataFrameWriter.scala:446)
at Test$$anonfun$1.apply(Test.scala:177)
at Test$$anonfun$1.apply(Test.scala:117)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
at org.apache.spark.streaming.dstream.DStream$$anonfun$foreachRDD$1$$anonfun$apply$mcV$sp$3.apply(DStream.scala:627)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1$$anonfun$apply$mcV$sp$1.apply(ForEachDStream.scala:51)
at org.apache.spark.streaming.dstream.DStream.createRDDWithLocalProperties(DStream.scala:415)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply$mcV$sp(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at org.apache.spark.streaming.dstream.ForEachDStream$$anonfun$1.apply(ForEachDStream.scala:50)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.streaming.scheduler.Job.run(Job.scala:39)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply$mcV$sp(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler$$anonfun$run$1.apply(JobScheduler.scala:254)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at org.apache.spark.streaming.scheduler.JobScheduler$JobHandler.run(JobScheduler.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
我使用 spark 版本 2.1.0 和我的数据库作为两列 A 和 B,分别键入 varchar 和 number。
你有什么想法吗?
【问题讨论】:
【参考方案1】:其实我用的是mysql的驱动,尽管有oracle的驱动。 我应该使用
prop.setProperty("driver", "oracle.jdbc.driver.OracleDriver")
而不是
prop.setProperty("driver", "com.mysql.jdbc.Driver")
【讨论】:
【参考方案2】:它应该是“oracle.jdbc.OracleDriver”,因为驱动程序包中的那个已被弃用。
prop.setProperty("driver", "oracle.jdbc.OracleDriver")
【讨论】:
@a.moussa 。你能检查一下并提供解决方案吗***.com/questions/56151363/…以上是关于我无法使用 jdbc 将 Spark DataFrame 写入数据库的主要内容,如果未能解决你的问题,请参考以下文章
使用 java Spark DataFrame 通过 jdbc 访问 Oracle
无法使用 JDBC 连接到 Spark thriftserver
在 Spark SQL 中使用 Presto JDBC 时无法识别的连接属性“url”
无法使用 jdbc 和 spark 连接器从 databricks 集群连接到 Azure 数据库 for MySQL 服务器
无法在 ipython 中正确创建 spark 上下文以链接到 MySQL - com.mysql.jdbc.Driver