设置 hive.exec.pre.hooks 时出现 ClassNotFoundException

Posted

技术标签:

【中文标题】设置 hive.exec.pre.hooks 时出现 ClassNotFoundException【英文标题】:ClassNotFoundException when setting hive.exec.pre.hooks 【发布时间】:2018-08-12 10:08:30 【问题描述】:

我正在关注这个文档做hive hook:

http://dharmeshkakadia.github.io/hive-hook/

但是当show tables时我得到了这个错误

2018-08-12 09:57:38,122 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: hive.exec.pre.hooks Class not found: HiveExampleHook
2018-08-12 09:57:38,122 ERROR org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(HiveExampleHook)
java.lang.ClassNotFoundException: HiveExampleHook
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:100)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:64)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
    at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
    at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)

2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: </PERFLOG method=Driver.execute start=1534067858120 end=1534067858122 duration=2 from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.Driver: [HiveServer2-Background-Pool: Thread-315]: Completed executing command(queryId=hive_20180812095757_e6516d83-ddc9-4f82-8151-def7e7f1eb37); Time taken: 0.002 seconds
2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,122 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-315]: </PERFLOG method=releaseLocks start=1534067858122 end=1534067858122 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 09:57:38,130 ERROR org.apache.hive.service.cli.operation.Operation: [HiveServer2-Background-Pool: Thread-315]: Error running hive query:
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: FAILED: Hive Internal Error: java.lang.ClassNotFoundException(HiveExampleHook)
    at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:238)
    at org.apache.hive.service.cli.operation.SQLOperation.access$300(SQLOperation.java:89)
    at org.apache.hive.service.cli.operation.SQLOperation$3$1.run(SQLOperation.java:301)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
    at org.apache.hive.service.cli.operation.SQLOperation$3.run(SQLOperation.java:314)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: HiveExampleHook
    at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:270)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:100)
    at org.apache.hadoop.hive.ql.hooks.HooksLoader.getHooks(HooksLoader.java:64)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1674)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1501)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1280)
    at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:236)
    ... 11 more

我确定最后一步add jar target/Hive-hook-example-1.0.jar; 是错误的。

我尝试了以下方法:

    我把jar文件放到hdfs /user/hive/ :

    添加jar hdfs:///user/hive/Hive-hook-example-1.0.jar;

    我还将“Hive 辅助 JAR 目录”设置为 /home/centos/HiveExampleHook/target/Hive-hook-example-1.0.jar 在Hiveserver2节点并重启Hive plus beeline。

    将jar文件复制到/opt/cloudera/parcels/CDH/jars/

    将jar文件复制到/opt/cloudera/parcels/CDH/lib/hive/lib/

没有任何帮助。

有什么想法吗?

更新 1:

如果我这样做 LIST JARS; 这会显示

+----------------------------------------------------+--+
|                      resource                      |
+----------------------------------------------------+--+
| /tmp/3fe67bb1-5cfd-427f-8faa-cab6524afeb3_resources/Hive-hook-example-1.0.jar |
+----------------------------------------------------+--+

我也尝试了两种方法来做CREATE FUNCTION

CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook';
INFO  : Compiling command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f); Time taken: 0.002 seconds
INFO  : Executing command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook'
INFO  : Starting task [Stage-0:FUNC] in serial mode
ERROR : FAILED: Class HiveExampleHook not found
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
INFO  : Completed executing command(queryId=hive_20180812153838_47589f9d-eaeb-410d-80b0-9cf414ae557f); Time taken: 0.003 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)

还有……

CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar';
INFO  : Compiling command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar'
INFO  : Semantic Analysis Completed
INFO  : Returning Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401); Time taken: 0.004 seconds
INFO  : Executing command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401): CREATE TEMPORARY FUNCTION test1 AS 'HiveExampleHook' USING JAR 'hdfs:///user/hive/Hive-hook-example-1.0.jar'
INFO  : Starting task [Stage-0:FUNC] in serial mode
INFO  : converting to local hdfs:///user/hive/Hive-hook-example-1.0.jar
INFO  : Added [/tmp/3fe67bb1-5cfd-427f-8faa-cab6524afeb3_resources/Hive-hook-example-1.0.jar] to class path
INFO  : Added resources: [hdfs:///user/hive/Hive-hook-example-1.0.jar]
ERROR : FAILED: Class HiveExampleHook not found
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask
INFO  : Completed executing command(queryId=hive_20180812153939_cf1f31c9-0361-47dc-8903-78221bd12401); Time taken: 0.03 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)

很明显它可以找到 jar 但找不到类名。我说的对吗?

更新 2:

我试过了:

[Hive-hook-example]# java -cp `pwd`/target/Hive-hook-example-1.0.jar HiveExampleHook

还是得到了这个:

Error: Could not find or load main class HiveExampleHook

我相信这是我犯的一个愚蠢的错误。

更新 3:

好的,我明白了。 您必须使用 hive CLI 而不是 beeline 才能使其工作。

hive> add jar hdfs:///user/hive/Hive-hook-example-1.0.jar;
add jar hdfs:///user/hive/Hive-hook-example-1.0.jar
converting to local hdfs:///user/hive/Hive-hook-example-1.0.jar
Added [/tmp/0a90132d-70cd-4ef0-b4cd-e75dc823e5ca_resources/Hive-hook-example-1.0.jar] to class path
Added resources: [hdfs:///user/hive/Hive-hook-example-1.0.jar]
hive> set hive.exec.pre.hooks=HiveExampleHook;
set hive.exec.pre.hooks=HiveExampleHook
hive> show tables;
show tables
Hello from the hook !!
OK
test1
Time taken: 0.023 seconds, Fetched: 5 row(s)

那么问题是如何直线运行呢?因为 hive CLI 已被弃用。

更新 4:

我决定这样做:

beeline 看到这个:

2018-08-12 16:39:13,286 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-60]: <PERFLOG method=PreHook.HiveExampleHook from=org.apache.hadoop.hive.ql.Driver>
2018-08-12 16:39:13,286 INFO  org.apache.hadoop.hive.ql.log.PerfLogger: [HiveServer2-Background-Pool: Thread-60]: </PERFLOG method=PreHook.HiveExampleHook start=1534091953286 end=1534091953286 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2

这是一些进步,尽管我不确定这意味着什么以及课程是否已开课。因为我没有看到任何输出。

【问题讨论】:

FWIW... 我隐约记得几年前我曾大声咒骂过 add jaradd jars 之间的差异(记录很少)。 【参考方案1】:

使用beeline,您必须在添加jar时使用HDFS路径。请记住,beeline 只是一个 JDBC CLI,因此当您使用带有本地路径的 add jar 时,它具有对您本地路径的引用,这是集群上运行的 hive 会话无法访问的。

(感谢https://twitter.com/quanghoc/status/1028671393376874496的帮助。我是你提到的博客的作者。)

【讨论】:

我使用了 hdfs 路径。但它不适用于直线,仅适用于 Hive CLI。这是为什么?请查看我的更新。 您在哪个更新中使用了带有直线的 HDFS 路径?抱歉,它与许多更新有点混淆。 我原来的 UPDATE 0 使用了 hdfs 路径。我从来没有使用过本地路径。 我认为您设置本地 Aux jar 路径可能搞砸了。只需在 HDFS 上安装 jar,运行 add jar,使用 list jar 进行查看即可。 jar 在 HDFS 中。那么我应该清空 Aux jar 路径吗?

以上是关于设置 hive.exec.pre.hooks 时出现 ClassNotFoundException的主要内容,如果未能解决你的问题,请参考以下文章

设置 LayoutParams 时出现 NullPointerException [重复]

设置 NavigationBar 半透明时出现错误

设置 EJS 项目时出现问题

JSLint:设置控制台时出现只读错误,即使它设置为可写全局

将内容描述设置为图像按钮时出现问题

为图像设置 exif 数据时出现问题