大数据之Hive:with tmp1 as ()

Posted 浊酒南街

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了大数据之Hive:with tmp1 as ()相关的知识,希望对你有一定的参考价值。

1、with tmp1 as ()

语义:将()内查询的结果,取表名tmp1;

2、好处

本质是子查询,这样写的好处是:

  1. 写sql和看的sql的时候结构更清晰明了;
  2. 如果你写是sql 有复用这些子查询结果的情况,只需要先算一次就可以了,因为这个结果会缓存在内存里面,如果采用传统的方式,用几次就算几次。

3、实例

with
tmp_login as
(
    select
        user_id,
        count(*) login_count
    from dwd_start_log
    where dt='2021-03-20'
    and user_id is not null
    group by user_id
),
tmp_cart as
(
    select
        user_id,
        count(*) cart_count
    from dwd_action_log
    where dt='2021-03-20'
    and user_id is not null
    and action_id='cart_add'
    group by user_id
),tmp_order as
(
    select
        user_id,
        count(*) order_count,
        sum(final_total_amount) order_amount
    from dwd_fact_order_info
    where dt='2021-03-20'
    group by user_id
) ,
tmp_payment as
(
    select
        user_id,
        count(*) payment_count,
        sum(payment_amount) payment_amount
    from dwd_fact_payment_info
    where dt='2021-03-20'
    group by user_id
),
tmp_order_detail as
(
    select
        user_id,
        collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'order_amount',order_amount)) order_stats
    from
    (
        select
            user_id,
            sku_id,
            sum(sku_num) sku_num,
            count(*) order_count,
            cast(sum(final_amount_d) as decimal(20,2)) order_amount
        from dwd_fact_order_detail
        where dt='2021-03-20'
        group by user_id,sku_id
    )tmp
    group by user_id
)

insert overwrite table dws_user_action_daycount partition(dt='2021-03-20')
select
    tmp_login.user_id,
    login_count,
    nvl(cart_count,0),
    nvl(order_count,0),
    nvl(order_amount,0.0),
    nvl(payment_count,0),
    nvl(payment_amount,0.0),
    order_stats
from tmp_login
left join tmp_cart on tmp_login.user_id=tmp_cart.user_id
left join tmp_order on tmp_login.user_id=tmp_order.user_id
left join tmp_payment on tmp_login.user_id=tmp_payment.user_id
left join tmp_order_detail on tmp_login.user_id=tmp_order_detail.user_id;

以上是关于大数据之Hive:with tmp1 as ()的主要内容,如果未能解决你的问题,请参考以下文章

with as 语句真的会把数据存内存嘛?(源码剖析)

HIVE中的insert和with as配合使用

sql server 中的 CTE (With table as) 在 hive 中等效吗?

大数据系列之数据仓库Hive原理

文献导读 - Machine Learning Identifies Stemness Features Associated with Oncogenic Dedifferentiation(示例代

大数据技术之 Hive (小白入门)