Hive full join 对数 排查异常数据

Posted 二十六画生的博客

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive full join 对数 排查异常数据相关的知识,希望对你有一定的参考价值。

 

select 
t1.*,
t2.*
from (
select  1 as c1 , 'a' as c2 
union all 
select  2 as c1 , 'b' as c2 
union all 
select  3 as c1 , null as c2 
union all 
select  4 as c1 , 'd' as c2 
)t1 
full join 
(
select  2 as c1 , 'b' as c2 
union all 
select  3 as c1 , null as c2 
union all 
select  4 as c1 , 'd4' as c2 
union all 
select  5 as c1 , 'e' as c2 
)t2 
on t1.c1 = t2.c1 
order by coalesce(t1.c1 , t2.c1 ) ; -- 5条记录


select 
t1.*,
t2.*
from (
select  1 as c1 , 'a' as c2 
union all 
select  2 as c1 , 'b' as c2 
union all 
select  3 as c1 , null as c2 
union all 
select  4 as c1 , 'd' as c2 
)t1 
full join 
(
select  2 as c1 , 'b' as c2 
union all 
select  3 as c1 , null as c2 
union all 
select  4 as c1 , 'd4' as c2 
union all 
select  5 as c1 , 'e' as c2 
)t2 
on t1.c1 = t2.c1 
where  t1.c1 is null or  t2.c1 is null or (coalesce(t1.c2,'') <> coalesce(t2.c2,''))
order by coalesce(t1.c1 , t2.c1 ) ; -- 3条记录


select 
t1.*,
t2.*
from (
select  1 as c1 , 'a' as c2 
union all 
select  2 as c1 , 'b' as c2 
union all 
select  3 as c1 , null as c2 
union all 
select  4 as c1 , 'd' as c2 
)t1 
full join 
(
select  2 as c1 , 'b' as c2 
union all 
select  3 as c1 , null as c2 
union all 
select  4 as c1 , 'd4' as c2 
union all 
select  5 as c1 , 'e' as c2 
)t2 
on t1.c1 = t2.c1 
where  (coalesce(t1.c2,'') <> coalesce(t2.c2,''))
order by coalesce(t1.c1 , t2.c1 ) ; -- 3条记录 ,是包含第二种条件的结果的,
-- 所以【t1.c1 is null or  t2.c1 is null】这个条件就不用写了!

一的结果:

二 三的结果:

以上是关于Hive full join 对数 排查异常数据的主要内容,如果未能解决你的问题,请参考以下文章

hive中多表full join主键重复问题

hive 之 join 大法

hive join数据错误

关于full join 语句的性能问题 跪求大虾帮忙解决.

Spark 广播join 与 Hive map join

SparkSQL并发写入orcparquet表的异常问题排查