Apache Spark Hive 中删除表查询的 ClassCastException

Posted

技术标签:

【中文标题】Apache Spark Hive 中删除表查询的 ClassCastException【英文标题】:ClassCastException on Drop table query in apache spark hive 【发布时间】:2016-07-26 10:23:56 【问题描述】:

我正在使用以下配置单元查询:

this.queryExecutor.executeQuery("Drop table user")

我得到以下异常:

java.lang.LinkageError: ClassCastException: attempting to castjar:file:/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/javax/ws/rs/ext/RuntimeDelegate.classtojar:file:/usr/hdp/2.4.2.0-258/spark/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar!/javax/ws/rs/ext/RuntimeDelegate.class
at javax.ws.rs.ext.RuntimeDelegate.findDelegate(RuntimeDelegate.java:116)
    at javax.ws.rs.ext.RuntimeDelegate.getInstance(RuntimeDelegate.java:91)
    at javax.ws.rs.core.MediaType.<clinit>(MediaType.java:44)
    at com.sun.jersey.core.header.MediaTypes.<clinit>(MediaTypes.java:64)
    at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:182)
    at com.sun.jersey.core.spi.factory.MessageBodyFactory.initReaders(MessageBodyFactory.java:175)
    at com.sun.jersey.core.spi.factory.MessageBodyFactory.init(MessageBodyFactory.java:162)
    at com.sun.jersey.api.client.Client.init(Client.java:342)
    at com.sun.jersey.api.client.Client.access$000(Client.java:118)
    at com.sun.jersey.api.client.Client$1.f(Client.java:191)
    at com.sun.jersey.api.client.Client$1.f(Client.java:187)
    at com.sun.jersey.spi.inject.Errors.processWithErrors(Errors.java:193)
    at com.sun.jersey.api.client.Client.<init>(Client.java:187)
    at com.sun.jersey.api.client.Client.<init>(Client.java:170)
    at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.serviceInit(TimelineClientImpl.java:340)
    at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
    at org.apache.hadoop.hive.ql.hooks.ATSHook.<init>(ATSHook.java:67)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at java.lang.Class.newInstance(Class.java:442)
    at org.apache.hadoop.hive.ql.hooks.HookUtils.getHooks(HookUtils.java:60)
    at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1309)
    at org.apache.hadoop.hive.ql.Driver.getHooks(Driver.java:1293)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1347)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:495)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:484)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:290)
    at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:237)
    at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:236)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:279)
    at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:484)
    at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:474)
    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:613)
    at org.apache.spark.sql.hive.execution.DropTable.run(commands.scala:89)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:58)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:56)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:70)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:132)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$5.apply(SparkPlan.scala:130)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:130)
    at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:55)
    at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:55)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:145)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:130)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
    at com.accenture.aa.dmah.spark.core.QueryExecutor.executeQuery(QueryExecutor.scala:35)
    at com.accenture.aa.dmah.attribution.transformer.MulltipleUserJourneyTransformer.transform(MulltipleUserJourneyTransformer.scala:32)
    at com.accenture.aa.dmah.attribution.userjourney.UserJourneyBuilder$$anonfun$buildUserJourney$1.apply$mcVI$sp(UserJourneyBuilder.scala:31)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
    at com.accenture.aa.dmah.attribution.userjourney.UserJourneyBuilder.buildUserJourney(UserJourneyBuilder.scala:29)
    at com.accenture.aa.dmah.attribution.core.AttributionHub.executeAttribution(AttributionHub.scala:47)
    at com.accenture.aa.dmah.attribution.jobs.AttributionJob.process(AttributionJob.scala:33)
    at com.accenture.aa.dmah.core.DMAHJob.processJob(DMAHJob.scala:73)
    at com.accenture.aa.dmah.core.DMAHJob.execute(DMAHJob.scala:27)
    at com.accenture.aa.dmah.core.JobRunner.<init>(JobRunner.scala:17)
    at com.accenture.aa.dmah.core.ApplicationInstance.initilize(ApplicationInstance.scala:48)
    at com.accenture.aa.dmah.core.Bootstrap.boot(Bootstrap.scala:112)
    at com.accenture.aa.dmah.core.BootstrapObj$.main(Bootstrap.scala:134)
    at com.accenture.aa.dmah.core.BootstrapObj.main(Bootstrap.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at scala.tools.nsc.util.ScalaClassLoader$$anonfun$run$1.apply(ScalaClassLoader.scala:71)
    at scala.tools.nsc.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.asContext(ScalaClassLoader.scala:139)
    at scala.tools.nsc.util.ScalaClassLoader$class.run(ScalaClassLoader.scala:71)
    at scala.tools.nsc.util.ScalaClassLoader$URLClassLoader.run(ScalaClassLoader.scala:139)
    at scala.tools.nsc.CommonRunner$class.run(ObjectRunner.scala:28)
    at scala.tools.nsc.ObjectRunner$.run(ObjectRunner.scala:45)
    at scala.tools.nsc.CommonRunner$class.runAndCatch(ObjectRunner.scala:35)
    at scala.tools.nsc.ObjectRunner$.runAndCatch(ObjectRunner.scala:45)
    at scala.tools.nsc.MainGenericRunner.runTarget$1(MainGenericRunner.scala:74)
    at scala.tools.nsc.MainGenericRunner.process(MainGenericRunner.scala:96)
    at scala.tools.nsc.MainGenericRunner$.main(MainGenericRunner.scala:105)
    at scala.tools.nsc.MainGenericRunner.main(MainGenericRunner.scala)

我看到here 和here 也有类似的帖子,但他们到现在还没有任何回应。 也看过here,但我认为这不是一个有效的行动方案。

有趣的是,当我们尝试使用 drop table(或 drop table,如果存在)查询时,这是特定的。

希望找到相同的解决方案。

【问题讨论】:

你有这个问题的答案吗? 你解决了这个问题吗? @Edge7 嗨,不...我们无法解决此问题。事实证明,删除表的要求已被淘汰,因此无法对此进行更多研究。 非常奇怪的错误!我在 Spark 1.6 中得到了一些,但在 Spark 2.0 中错误消失了! 【参考方案1】:

据我所知,上述错误可能是由于示例类具有相同的包结构,即:在不同的 JAR 问题中发现了“javax.ws.rs.ext.RuntimeDelegate”。类对象是在运行时创建和转换的。因此,负责触发 DROP 语法的代码很可能会使用并破坏上述类,因为它在类路径中不止一次被发现。

我已经在 chd5 中尝试了 DROP 和 DROP IF EXISTS 并且工作正常,以下是我运行的详细信息:

首次运行 - Hadoop 版本 - 2.6、Hive 1.1.0 和 Spark - 1.3.1(包含用于 spark lib 的 hive 库) 第二次运行 -Hadoop 版本 - 2.6、Hive 1.1.0 和 Spark - 1.6.1 运行方式——cli

scala> sqlContext.sql("DROP TABLE SAMPLE");
16/08/04 11:31:39 INFO parse.ParseDriver: Parsing command: DROP TABLE SAMPLE
16/08/04 11:31:39 INFO parse.ParseDriver: Parse Completed
......
scala>sqlContext.sql("DROP TABLE IF EXISTS SAMPLE");
16/08/04 11:40:34 INFO parse.ParseDriver: Parsing command: DROP TABLE IF EXISTS SAMPLE
16/08/04 11:40:35 INFO parse.ParseDriver: Parse Completed
.....

如果可能,请使用不同版本的 spark lib 验证 DROP 命令以缩小问题范围。

同时,我正在分析 jar 以找出存在两次相同类“RuntimeDelegate”的链接,并将报告检查删除任何 jar 是否可以解决问题,添加 jar 是否应该重新创建相同的问题。

【讨论】:

另外,如果可能,请列出您的环境中的 Jersey 和 javax.ws.rs-api jar 引用,以防疑虑

以上是关于Apache Spark Hive 中删除表查询的 ClassCastException的主要内容,如果未能解决你的问题,请参考以下文章

使用 Spark 查询位于远程集群上的 Hive 数据

在 Apache Spark 中,用 Java 将数据帧写入 Hive 表

使用 Spark 查询 hive 表

Apache Spark:我如何理解和控制我的查询是在 Hive 引擎还是 Spark 引擎上执行的?

Apache Spark 结构化流 (DataStreamWriter) 写入 Hive 表

Spark 无法查询它可以看到的 Hive 表?