java.lang.NullPointerException 使用火花从 MSSQL 服务器读取数据

Posted

技术标签:

【中文标题】java.lang.NullPointerException 使用火花从 MSSQL 服务器读取数据【英文标题】:java.lang.NullPointerException while reading data from MSSQL server with spark 【发布时间】:2017-05-24 08:44:49 【问题描述】:

我在使用 Cloudera Spark 从 MSSQL 服务器读取数据时遇到问题。我不确定问题出在哪里以及是什么原因造成的。

这是我的 build.sbt

val sparkversion = "1.6.0-cdh5.10.1"
name := "SimpleSpark"
organization := "com.huff.spark"
version := "1.0"
scalaVersion := "2.10.5"
mainClass in Compile := Some("com.huff.spark.example.SimpleSpark")
assemblyJarName in assembly := "mssql.jar"


libraryDependencies ++= Seq(
    "org.apache.spark" %% "spark-streaming-kafka" % "1.6.0" % "provided",
    "org.apache.spark" %% "spark-streaming" % "1.6.0" % "provided",
    "org.apache.spark" % "spark-core_2.10" % sparkversion  % "provided", // to test in cluseter
    "org.apache.spark" % "spark-sql_2.10" % sparkversion % "provided" // to test in cluseter
)

resolvers += "Confluent IO" at "http://packages.confluent.io/maven"
resolvers += "Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos"

这是我的 scala 源代码:

package com.huff.spark.example

import org.apache.spark.sql._
import java.sql.Connection, DriverManager
import java.util.Properties
import org.apache.spark.SparkContext, SparkConf

object SimpleSpark 
    def main(args: Array[String]) 
        val sourceProp = new java.util.Properties
        val conf = new SparkConf().setAppName("SimpleSpark").setMaster("yarn-cluster")  //to test in cluster
        val sc = new SparkContext(conf)
        var SqlContext = new SQLContext(sc)
        val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"

        val jdbcDF = SqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlserver://sqltestsrver;databaseName=LEh;user=sparkaetl;password=sparkaetl","driver" -> driver,"dbtable" -> "StgS")).load()

            jdbcDF.show(5)
    

这是我看到的错误:

17/05/24 04:35:20 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException
java.lang.NullPointerException
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:155)
    at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:222)
    at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:146)
    at com.huff.spark.example.SimpleSpark$.main(SimpleSpark.scala:16)
    at com.huff.spark.example.SimpleSpark.main(SimpleSpark.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:552)
17/05/24 04:35:20 INFO ApplicationMaster: Final app status: FAILED, exitCode: 15, (reason: User class threw exception: java.lang.NullPointerException)

我知道问题出在第 16 行,即:

val jdbcDF = SqlContext.read.format("jdbc").options(Map("url" -> "jdbc:sqlserver://sqltestsrver;databaseName=LEh;user=sparkaetl;password=sparkaetl","driver" -> driver,"dbtable" -> "StgS")).load()

但我无法确定到底是什么问题。它与访问有关吗? (这是值得怀疑的),连接参数的问题(错误消息会说),或者我不知道的其他东西。在此先感谢:-)

【问题讨论】:

看起来像重复的***.com/questions/39318667/… How to connect (Py)Spark to Postgres database using JDBC的可能重复 【参考方案1】:

如果您使用的是 azure SQL 服务器,请从 azure 门户复制 jdbc 连接字符串。我试过了,它对我有用。

使用 Scala 模式的 Azure 数据块:

import com.microsoft.sqlserver.jdbc.SQLServerDriver
import java.sql.DriverManager
import org.apache.spark.sql.SQLContext
import sqlContext.implicits._

// MS SQL JDBC Connection String ... 
val jdbcSqlConn = "jdbc:sqlserver://***.database.windows.net:1433;database=**;user=***;password=****;encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=30;"

// Loading the ms sql table via spark context into dataframe
val jdbcDF = sqlContext.read.format("jdbc").options(
Map("url" -> jdbcSqlConn,
"driver" -> "com.microsoft.sqlserver.jdbc.SQLServerDriver",
"dbtable" -> "***")).load()

// Registering the temp table so that we can SQL like query against the table 
jdbcDF.registerTempTable("yourtablename")
// selecting only top 10 rows here but you can use any sql statement
val yourdata = sqlContext.sql("SELECT * FROM yourtablename LIMIT 10")
// display the data 
yourdata.show()

【讨论】:

【参考方案2】:

当您尝试关闭数据库Connection 时发生NPE,这表明系统无法通过JdbcUtils.createConnectionFactory 获取正确的连接器。您应该检查您的连接 URL 和失败的日志。

【讨论】:

以上是关于java.lang.NullPointerException 使用火花从 MSSQL 服务器读取数据的主要内容,如果未能解决你的问题,请参考以下文章

亲測,Eclipse报&quot;An error has occurred,See error log for more details. java.lang.NullPointerExce

来自 LayerDrawable 的异常

Microsoft Access 和 Java JDBC-ODBC 错误

如何使用其id获取单个SQLite行数据?