HIVE-----count(distinct ) over() 无法使用解决办法
Posted luckyfruit
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HIVE-----count(distinct ) over() 无法使用解决办法相关的知识,希望对你有一定的参考价值。
HIVE-----count(distinct ) over() 无法使用解决办法
在使用hive时发现count(distinct ) over() 报错
hive> with da as ( > select 1 a, ‘a‘ b union all > select 1 a, ‘a‘ b union all > select 2 a, ‘a‘ b union all > select 2 a, ‘a‘ b union all > select 2 a, ‘a‘ b union all > select 3 a, ‘b‘ b union all > select 3 a, ‘b‘ b union all > select 3 a, ‘b‘ b union all > select 3 a, ‘b‘ b union all > select 3 a, ‘b‘ b union all > select 3 a, ‘b‘ b union all > select 3 a, ‘b‘ b > ) > select > a > ,b > ,sum(a) over(partition by b) > , count(distinct a) over(partition by b) > from da; FAILED: SemanticException Failed to breakup Windowing invocations into Groups. At least 1 group must only depend on input columns. Also check for circular dependencies. Underlying error: org.apache.hadoop.hive.ql.parse.SemanticException: Line 18:26 Expression not in GROUP BY key ‘b‘
经过测试将
with da as ( select 1 a, ‘a‘ b union all select 1 a, ‘a‘ b union all select 2 a, ‘a‘ b union all select 2 a, ‘a‘ b union all select 2 a, ‘a‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b ) select count(distinct a) over(partition by b) from da
当且仅当至于count(distinct ) over()一个时段时能够使用,原因可能时内部实现distinct出错 不知道是否和版本有关 使用版本为Hive version 1.1.0
解决办法:如下使用collect_set(a) over(partition by b)函数将合并成一个分好组的集合 然后求出集合的值个数
因为collect_set()不能放入重复函数所以使用size()求集合元素数量时能达到count(distinct )的效果
with da as ( select 1 a, ‘a‘ b union all select 1 a, ‘a‘ b union all select 2 a, ‘a‘ b union all select 2 a, ‘a‘ b union all select 2 a, ‘a‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b union all select 3 a, ‘b‘ b ) select a ,b ,sum(a) over(partition by b) ,size(collect_set(a) over(partition by b)) from da
结果
以上是关于HIVE-----count(distinct ) over() 无法使用解决办法的主要内容,如果未能解决你的问题,请参考以下文章
Hive count(distinct c1) ,不包含NULL的总数