如何在 Hive 中使用子查询
Posted
技术标签:
【中文标题】如何在 Hive 中使用子查询【英文标题】:How to use subquery in Hive 【发布时间】:2020-01-14 21:54:32 【问题描述】:我可以在 mysql 中运行以下子查询命令,但它在 Hive 中不起作用。子查询的 Hive 格式是否与 MySQL 不同?
问题:
在 hive 中编写查询,显示属于某一天的所有订单。这一天是下单最多的日子。从 orders_sqoop 中选择数据
select *
from orders_sqoop
where order_date = (select order_date from orders_sqoop
group by order_date
order by count(order_id) desc
limit 1);
我在 hive (cloudera cdh) 中看到以下输出
NoViableAltException(226@[400:1: precedenceEqualExpression : ( (left= precedenceBitwiseOrExpression -> $left) ( ( KW_NOT precedenceEqualNegatableOperator notExpr= precedenceBitwiseOrExpression )
-> ^( KW_NOT ^( precedenceEqualNegatableOperator $precedenceEqualExpression $notExpr) ) | ( precedenceEqualOperator equalExpr= precedenceBitwiseOrExpression )
-> ^( precedenceEqualOperator $precedenceEqualExpression $equalExpr) | ( KW_NOT KW_IN LPAREN KW_SELECT )=> ( KW_NOT KW_IN subQueryExpression )
-> ^( KW_NOT ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_IN ) subQueryExpression $precedenceEqualExpression) ) | ( KW_NOT KW_IN expressions )
-> ^( KW_NOT ^( TOK_FUNCTION KW_IN $precedenceEqualExpression expressions ) ) | ( KW_IN LPAREN KW_SELECT )=> ( KW_IN subQueryExpression )
-> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_IN ) subQueryExpression $precedenceEqualExpression) | ( KW_IN expressions )
-> ^( TOK_FUNCTION KW_IN $precedenceEqualExpression expressions ) | ( KW_NOT KW_BETWEEN (min= precedenceBitwiseOrExpression ) KW_AND (max= precedenceBitwiseOrExpression ) )
-> ^( TOK_FUNCTION Identifier["between"] KW_TRUE $left $min $max) | ( KW_BETWEEN (min= precedenceBitwiseOrExpression ) KW_AND (max= precedenceBitwiseOrExpression ) )
-> ^( TOK_FUNCTION Identifier["between"] KW_FALSE $left $min $max) )* | ( KW_EXISTS LPAREN KW_SELECT )=> ( KW_EXISTS subQueryExpression )
-> ^( TOK_SUBQUERY_EXPR ^( TOK_SUBQUERY_OP KW_EXISTS ) subQueryExpression ) );])
at org.antlr.runtime.DFA.noViableAlt(DFA.java:158)
at org.antlr.runtime.DFA.predict(DFA.java:116)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8668)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9690)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAndExpression(HiveParser_IdentifiersParser.java:9809)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceOrExpression(HiveParser_IdentifiersParser.java:9968)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6584)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.atomExpression(HiveParser_IdentifiersParser.java:6808)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceFieldExpression(HiveParser_IdentifiersParser.java:6879)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnaryPrefixExpression(HiveParser_IdentifiersParser.java:7264)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceUnarySuffixExpression(HiveParser_IdentifiersParser.java:7324)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceBitwiseXorExpression(HiveParser_IdentifiersParser.java:7508)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceStarExpression(HiveParser_IdentifiersParser.java:7668)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedencePlusExpression(HiveParser_IdentifiersParser.java:7828)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAmpersandExpression(HiveParser_IdentifiersParser.java:7988)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceBitwiseOrExpression(HiveParser_IdentifiersParser.java:8147)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8803)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9690)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAndExpression(HiveParser_IdentifiersParser.java:9809)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceOrExpression(HiveParser_IdentifiersParser.java:9968)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.expression(HiveParser_IdentifiersParser.java:6584)
at org.apache.hadoop.hive.ql.parse.HiveParser.expression(HiveParser.java:44932)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.searchCondition(HiveParser_FromClauseParser.java:6530)
at org.apache.hadoop.hive.ql.parse.HiveParser_FromClauseParser.whereClause(HiveParser_FromClauseParser.java:6438)
at org.apache.hadoop.hive.ql.parse.HiveParser.whereClause(HiveParser.java:44974)
at org.apache.hadoop.hive.ql.parse.HiveParser.singleSelectStatement(HiveParser.java:42062)
at org.apache.hadoop.hive.ql.parse.HiveParser.selectStatement(HiveParser.java:41720)
at org.apache.hadoop.hive.ql.parse.HiveParser.regularBody(HiveParser.java:41657)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpressionBody(HiveParser.java:40710)
at org.apache.hadoop.hive.ql.parse.HiveParser.queryStatementExpression(HiveParser.java:40586)
at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1529)
at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1065)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:201)
at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:166)
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:522)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1356)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1473)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1285)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1275)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:226)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:175)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:389)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:781)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:699)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:634)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) FAILED: ParseException line 1:47 cannot recognize input near 'select' 'order_date' 'from' in expression specification
【问题讨论】:
您遇到错误了吗?错误的结果?请描述不工作是什么意思。 我只是再次更新问题,但出现错误..请检查..谢谢 cwiki.apache.org/confluence/display/Hive/… ? 【参考方案1】:只需将 = 更改为 In 运算符
IN 运算符被视为成员类型。成员类型允许您在一个语句中紧凑地进行多个匹配测试
select *
from orders_sqoop
where order_date in (select order_date from orders_sqoop group by order_date order by count(order_id) desc limit 1);
【讨论】:
失败:SemanticException [错误 10249]:第 1:33 行不受支持的子查询表达式“order_date”:相关表达式不能包含不合格的列引用。 如果没有在select中添加,可以按Count(id)排序【参考方案2】:使用 count() + rank():
select * --list all columns here
from
(
select s.*, rank() over(order by cnt desc) rnk
from
(
select s.*,
count(order_id) over(partition by order_date) cnt
from orders_sqoop s
) s
) s where rnk=1
在 WHERE 中使用子查询:
select o.*
from orders_sqoop o
where order_date in (select s.order_date from orders_sqoop s group by s.order_date order by count(s.order_id) desc limit 1);
【讨论】:
我在 where 中使用了子查询,但我现在看到以下错误 失败:SemanticException [错误 10249]:第 1:33 行不受支持的子查询表达式“order_date”:相关表达式不能包含不合格的列引用。 @user2774120 添加了别名。尝试一下。不过最好用第一种方法,反正效果会好很多以上是关于如何在 Hive 中使用子查询的主要内容,如果未能解决你的问题,请参考以下文章