大数据之Hive:with tmp1 as ()
Posted 浊酒南街
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据之Hive:with tmp1 as ()相关的知识,希望对你有一定的参考价值。
1、with tmp1 as ()
语义:将()内查询的结果,取表名tmp1;
2、好处
本质是子查询,这样写的好处是:
- 写sql和看的sql的时候结构更清晰明了;
- 如果你写是sql 有复用这些子查询结果的情况,只需要先算一次就可以了,因为这个结果会缓存在内存里面,如果采用传统的方式,用几次就算几次。
3、实例
with
tmp_login as
(
select
user_id,
count(*) login_count
from dwd_start_log
where dt='2021-03-20'
and user_id is not null
group by user_id
),
tmp_cart as
(
select
user_id,
count(*) cart_count
from dwd_action_log
where dt='2021-03-20'
and user_id is not null
and action_id='cart_add'
group by user_id
),tmp_order as
(
select
user_id,
count(*) order_count,
sum(final_total_amount) order_amount
from dwd_fact_order_info
where dt='2021-03-20'
group by user_id
) ,
tmp_payment as
(
select
user_id,
count(*) payment_count,
sum(payment_amount) payment_amount
from dwd_fact_payment_info
where dt='2021-03-20'
group by user_id
),
tmp_order_detail as
(
select
user_id,
collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'order_amount',order_amount)) order_stats
from
(
select
user_id,
sku_id,
sum(sku_num) sku_num,
count(*) order_count,
cast(sum(final_amount_d) as decimal(20,2)) order_amount
from dwd_fact_order_detail
where dt='2021-03-20'
group by user_id,sku_id
)tmp
group by user_id
)
insert overwrite table dws_user_action_daycount partition(dt='2021-03-20')
select
tmp_login.user_id,
login_count,
nvl(cart_count,0),
nvl(order_count,0),
nvl(order_amount,0.0),
nvl(payment_count,0),
nvl(payment_amount,0.0),
order_stats
from tmp_login
left join tmp_cart on tmp_login.user_id=tmp_cart.user_id
left join tmp_order on tmp_login.user_id=tmp_order.user_id
left join tmp_payment on tmp_login.user_id=tmp_payment.user_id
left join tmp_order_detail on tmp_login.user_id=tmp_order_detail.user_id;
以上是关于大数据之Hive:with tmp1 as ()的主要内容,如果未能解决你的问题,请参考以下文章
sql server 中的 CTE (With table as) 在 hive 中等效吗?
文献导读 - Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation(示例代