比较两个蜂巢表之间的计数
Posted
技术标签:
【中文标题】比较两个蜂巢表之间的计数【英文标题】:To compare count between two hive table 【发布时间】:2018-08-14 07:18:16 【问题描述】:我正在尝试在两个表之间进行计数比较。由于减号运算符在 hive 中不起作用,因此不会发生。您能否提供一些简单的方法来在两个表之间进行计数比较。
select 'Call Detail - Hive T1 to HDFS Staging - Data Compare',
case when cnt>0 then 'Fail' Else 'Pass' end
from
(select count(*) cnt from (
(select
count(*) from students1 s1)-
(select count(*) from students2 s2)
) as tbl1
) as tbl2;
抛出错误:
FAILED: ParseException line 81:0 cannot identify input near '(' '(' 'select' in from source
【问题讨论】:
【参考方案1】:如果您没有按列分组,请使用 cross join
。在这种情况下,它将产生一个具有两个计数的行:
select s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result
from
(select count(*) cnt from students1 s1) s
cross join
(select count(*) cnt from students2 s2) s1
如果您要添加一些按列分组以比较更详细的粒度,请在按列分组上使用FULL JOIN
:
select s.col1 s_col1, s1.col1 s1_col1, s.cnt-s1.cnt diff, case when abs(s.cnt-s1.cnt) > 0 then 'Fail' Else 'Pass' end result
from
(select count(*) cnt, col1 from students1 s1 group by col1) s
full join
(select count(*) cnt, col1 from students2 s2 group by col1) s1
on s.col1 = s1.col1
此查询将返回计算了差异的连接行,也返回两个表中未连接的行。
【讨论】:
【参考方案2】:查看下面的查询 .. 它在我的系统本地运行良好。 如果有帮助,请告诉我。
select 'Call Detail - Hive T1 to HDFS Staging - Data Compare',
case
when (sum(cnt1) - sum(cnt2)) > 0
then 'PASS'
else 'FAIL'
end as count_records
from (select count(*) as cnt1, 0 as cnt2 from students1
union all
select 0 as cnt1, count(*) as cnt2 from students1 ) tbl;
【讨论】:
我运行了您的查询。但我仍然收到此错误。编译语句时出错:FAILED: ParseException line 92:8 cannot identify input near '-' '(' 'select' in subquery source 嗨..修改了查询..现在检查 嗨,它仍然抛出这个错误。请检查一下。编译语句时出错:FAILED: ParseException line 98:8 cannot identify input near 'select' 'count' '(' in expression specification 嗨@Kumar,检查我当前的查询..让我知道它是否适合你 嗨@Kumar ..如果查询对你有用,你能更新一下吗?以上是关于比较两个蜂巢表之间的计数的主要内容,如果未能解决你的问题,请参考以下文章