如何在 Hive 中使用子查询

Posted

技术标签:

【中文标题】如何在 Hive 中使用子查询【英文标题】:How to use subquery in Hive 【发布时间】:2020-01-14 21:54:32 【问题描述】:

我可以在 mysql 中运行以下子查询命令,但它在 Hive 中不起作用。子查询的 Hive 格式是否与 MySQL 不同?

问题:

在 hive 中编写查询,显示属于某一天的所有订单。这一天是下单最多的日子。从 orders_sqoop 中选择数据

select * 
from orders_sqoop 
where order_date = (select order_date from orders_sqoop 
                    group by order_date 
                    order by count(order_id) desc 
                    limit 1);

我在 hive (cloudera cdh) 中看到以下输出

NoViableAltException(226@[400:1: precedenceEqualExpression : ( (left= precedenceBitwiseOrExpression -> $left) ( ( KW_NOT precedenceEqualNegatableOperator notExpr= precedenceBitwiseOrExpression ) 
   -> ^( KW_NOT ^( precedenceEqualNegatableOperator $precedenceEqualExpression $notExpr) ) | ( precedenceEqualOperator equalExpr= precedenceBitwiseOrExpression ) 
   -> ^( precedenceEqualOperator $precedenceEqualExpression $equalExpr) | ( KW_NOT KW_IN LPAREN KW_SELECT )=> ( KW_NOT KW_IN subQueryExpression ) 
   -> ^( KW_NOT ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_IN ) subQueryExpression $precedenceEqualExpression) ) | ( KW_NOT KW_IN expressions ) 
   -> ^( KW_NOT ^( TOK_FUNCTION KW_IN $precedenceEqualExpression expressions ) ) | ( KW_IN LPAREN KW_SELECT )=> ( KW_IN subQueryExpression ) 
   -> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_IN ) subQueryExpression $precedenceEqualExpression) | ( KW_IN expressions ) 
   -> ^( TOK_FUNCTION KW_IN $precedenceEqualExpression expressions ) | ( KW_NOT KW_BETWEEN (min= precedenceBitwiseOrExpression ) KW_AND (max= precedenceBitwiseOrExpression ) ) 
   -> ^( TOK_FUNCTION Identifier["between"] KW_TRUE $left $min $max) | ( KW_BETWEEN (min= precedenceBitwiseOrExpression ) KW_AND (max= precedenceBitwiseOrExpression ) ) 
   -> ^( TOK_FUNCTION Identifier["between"] KW_FALSE $left $min $max) )* | ( KW_EXISTS LPAREN KW_SELECT )=> ( KW_EXISTS subQueryExpression ) 
   -> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
        at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
        at org.antlr.runtime.DFA.predict(DFA.java:116)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8668)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9690)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAndExpression(HiveParser_IdentifiersParser.java:9809)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceOrExpression(HiveParser_IdentifiersParser.java:9968)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6584)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6808)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6879)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7264)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7324)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceBitwiseXorExpression(HiveParser_IdentifiersParser.java:7508)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceStarExpression(HiveParser_IdentifiersParser.java:7668)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedencePlusExpression(HiveParser_IdentifiersParser.java:7828)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAmpersandExpression(HiveParser_IdentifiersParser.java:7988)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceBitwiseOrExpression(HiveParser_IdentifiersParser.java:8147)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8803)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9690)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAndExpression(HiveParser_IdentifiersParser.java:9809)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceOrExpression(HiveParser_IdentifiersParser.java:9968)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6584)
        at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:44932)
        at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.searchCondition(HiveParser_FromClauseParser.java:6530)
        at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.whereClause(HiveParser_FromClauseParser.java:6438)
        at org.apache.hadoop.hive.ql.parse.HiveParser.whereClause(HiveParser.java:44974)
        at org.apache.hadoop.hive.ql.parse.HiveParser.singleSelectStatement(HiveParser.java:42062)
        at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:41720)
        at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41657)
        at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40710)
        at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40586)
        at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1529)
        at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1065)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1356)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1473)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1275)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:226)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:175)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:389)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:47 cannot recognize input near 'select' 'order_date' 'from' in expression specification

【问题讨论】:

您遇到错误了吗?错误的结果?请描述不工作是什么意思。 我只是再次更新问题,但出现错误..请检查..谢谢 cwiki.apache.org/confluence/display/Hive/… ? 【参考方案1】:

只需将 = 更改为 In 运算符

IN 运算符被视为成员类型。成员类型允许您在一个语句中紧凑地进行多个匹配测试

select * 
  from orders_sqoop 
 where order_date in  (select order_date from orders_sqoop group by order_date order by count(order_id) desc limit 1);

【讨论】:

失败:SemanticException [错误 10249]:第 1:33 行不受支持的子查询表达式“order_date”:相关表达式不能包含不合格的列引用。 如果没有在select中添加,可以按Count(id)排序【参考方案2】:

使用 count() + rank():

select * --list all columns here
from
(
select s.*, rank() over(order by cnt desc) rnk 
from
(
select s.*, 
       count(order_id) over(partition by order_date) cnt 
  from orders_sqoop s
) s 
) s where rnk=1 

在 WHERE 中使用子查询:

select o.* 
  from orders_sqoop o
  where order_date in (select s.order_date from orders_sqoop s group by s.order_date order by count(s.order_id) desc limit 1);

【讨论】:

我在 where 中使用了子查询,但我现在看到以下错误 失败:SemanticException [错误 10249]:第 1:33 行不受支持的子查询表达式“order_date”:相关表达式不能包含不合格的列引用。 @user2774120 添加了别名。尝试一下。不过最好用第一种方法,反正效果会好很多

以上是关于如何在 Hive 中使用子查询的主要内容,如果未能解决你的问题,请参考以下文章

如何在子查询中使用外部查询中的列从另一个表中获取结果?

如何克服 Hive for CASE 语句中的子查询

如何在配置单元中使用正则表达式排除字符串中的特殊字符

hive UNION和子查询

关于Hive中case when不准使用子查询的解决方法

在 hadoop 中如何执行 hive 查询