通过 Spark 的 Hive JDBC 连接(Nullpointer 异常)

Posted

技术标签:

【中文标题】通过 Spark 的 Hive JDBC 连接(Nullpointer 异常)【英文标题】:Hive JDBC connection Via Spark (Nullpointer Exception) 【发布时间】:2019-05-27 10:54:24 【问题描述】:

我正在尝试使用 Hive JDBC 连接运行 spark 并获得空指针异常。以下相同的命令在我的其他集群中运行良好。

我在 Spark-shell 中运行它

val jdbcDF = spark.read.format("jdbc").option("url", "jdbc:hive2://bl.com:10000").option("dbtable", "cds.txn_fact").option("user", "user").option("password", "pwd").option("fetchsize","20").load()

这是我得到的错误。

org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NullPointerException
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:255)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:241)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:254)
at org.apache.hive.jdbc.HiveStatement.executeQuery(HiveStatement.java:392)
at org.apache.hive.jdbc.HivePreparedStatement.executeQuery(HivePreparedStatement.java:109)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:60)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:115)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:52)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:340)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:239)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:227)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:164)
... 51 elided
Caused by: org.apache.hive.service.cli.HiveSQLException: Error running query: java.lang.NullPointerException
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:180)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:228)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:264)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:479)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:466)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
at com.sun.proxy.$Proxy47.executeStatementAsync(Unknown Source)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1377)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1362)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: null
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1227)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:146)
... 27 more

【问题讨论】:

【参考方案1】:

我正在使用 HDP2 并且在 HDP2 配置单元中不支持来自 spark 的并发 JDBC 调用,因此会出现空指针异常。

这在 HDP3 中已修复,现在我可以运行上述代码而不会出现任何错误。

【讨论】:

以上是关于通过 Spark 的 Hive JDBC 连接(Nullpointer 异常)的主要内容,如果未能解决你的问题,请参考以下文章

使用Spark实现推主机群Hive数据到租户集群Hive的高性能Hive2Hive数据集成Java需编写JDBC连接Hive解析元数据

Spark上的Hive如何从jdbc读取数据?

SPARK_sql加载,hive以及jdbc使用

Spark集群模式下的Impala JDBC连接问题

为啥hive与mysql整合

通过 Java JDBC 连接 Hive