java.lang.NullPointerException 使用火花从 MSSQL 服务器读取数据
Posted
技术标签:
【中文标题】java.lang.NullPointerException 使用火花从 MSSQL 服务器读取数据【英文标题】:java.lang.NullPointerException while reading data from MSSQL server with spark 【发布时间】:2017-05-24 08:44:49 【问题描述】:我在使用 Cloudera Spark 从 MSSQL 服务器读取数据时遇到问题。我不确定问题出在哪里以及是什么原因造成的。
这是我的 build.sbt
val sparkversion = "1.6.0-cdh5.10.1"
name := "SimpleSpark"
organization := "com.huff.spark"
version := "1.0"
scalaVersion := "2.10.5"
mainClass in Compile := Some("com.huff.spark.example.SimpleSpark")
assemblyJarName in assembly := "mssql.jar"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-streaming-kafka" % "1.6.0" % "provided",
"org.apache.spark" %% "spark-streaming" % "1.6.0" % "provided",
"org.apache.spark" % "spark-core_2.10" % sparkversion % "provided", // to test in cluseter
"org.apache.spark" % "spark-sql_2.10" % sparkversion % "provided" // to test in cluseter
)
resolvers += "Confluent IO" at "http://packages.confluent.io/maven"
resolvers += "Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos"
这是我的 scala 源代码:
package com.huff.spark.example
import org.apache.spark.sql._
import java.sql.Connection, DriverManager
import java.util.Properties
import org.apache.spark.SparkContext, SparkConf
object SimpleSpark
def main(args: Array[String])
val sourceProp = new java.util.Properties
val conf = new SparkConf().setAppName("SimpleSpark").setMaster("yarn-cluster") //to test in cluster
val sc = new SparkContext(conf)
var SqlContext = new SQLContext(sc)
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val jdbcDF = SqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlserver://sqltestsrver;databaseName=LEh;user=sparkaetl;password=sparkaetl","driver" -> driver,"dbtable" -> "StgS")).load()
jdbcDF.show(5)
这是我看到的错误:
17/05/24 04:35:20 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:155)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
at com.huff.spark.example.SimpleSpark$.main(SimpleSpark.scala:16)
at com.huff.spark.example.SimpleSpark.main(SimpleSpark.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:552)
17/05/24 04:35:20 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NullPointerException)
我知道问题出在第 16 行,即:
val jdbcDF = SqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlserver://sqltestsrver;databaseName=LEh;user=sparkaetl;password=sparkaetl","driver" -> driver,"dbtable" -> "StgS")).load()
但我无法确定到底是什么问题。它与访问有关吗? (这是值得怀疑的),连接参数的问题(错误消息会说),或者我不知道的其他东西。在此先感谢:-)
【问题讨论】:
看起来像重复的***.com/questions/39318667/… How to connect (Py)Spark to Postgres database using JDBC的可能重复 【参考方案1】:如果您使用的是 azure SQL 服务器,请从 azure 门户复制 jdbc 连接字符串。我试过了,它对我有用。
使用 Scala 模式的 Azure 数据块:
import com.microsoft.sqlserver.jdbc.SQLServerDriver
import java.sql.DriverManager
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._
// MS SQL JDBC Connection String ...
val jdbcSqlConn = "jdbc:sqlserver://***.database.windows.net:1433;database=**;user=***;password=****;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"
// Loading the ms sql table via spark context into dataframe
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConn,
"driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"dbtable" -> "***")).load()
// Registering the temp table so that we can SQL like query against the table
jdbcDF.registerTempTable("yourtablename")
// selecting only top 10 rows here but you can use any sql statement
val yourdata = sqlContext.sql("SELECT * FROM yourtablename LIMIT 10")
// display the data
yourdata.show()
【讨论】:
【参考方案2】:当您尝试关闭数据库Connection
时发生NPE,这表明系统无法通过JdbcUtils.createConnectionFactory
获取正确的连接器。您应该检查您的连接 URL 和失败的日志。
【讨论】:
以上是关于java.lang.NullPointerException 使用火花从 MSSQL 服务器读取数据的主要内容,如果未能解决你的问题,请参考以下文章
亲測,Eclipse报"An error has occurred,See error log for more details. java.lang.NullPointerExce