数据仓库 DWS层之用户行为宽表
Posted noyouth
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了数据仓库 DWS层之用户行为宽表相关的知识,希望对你有一定的参考价值。
为什么需要用户行为宽表?把每个用户单日的行为聚合起来组成一张多列宽表,以便之后关联用户维度信息后,进行不同角度的统计分析。
数据来源:DWD层相关的业务数据表
创建用户行为宽表:
这张宽表整合了下单、支付和评论3种行为。
drop table if exists dws_user_action; create external table dws_user_action ( user_id string comment ‘用户 id‘, order_count bigint comment ‘下单次数 ‘, order_amount decimal(16,2) comment ‘下单金额 ‘, payment_count bigint comment ‘支付次数‘, payment_amount decimal(16,2) comment ‘支付金额 ‘, comment_count bigint comment ‘评论次数‘ ) COMMENT ‘每日用户行为宽表‘ PARTITIONED BY (`dt` string) stored as parquet location ‘/warehouse/gmall/dws/dws_user_action/‘ tblproperties ("parquet.compression"="snappy");
数据导入脚本:
with as基本语法为如下,作用是定义一个临时表,可以在后续的语句中多次使用,提高sql可读性。注意多个临时表之间用逗号,而最后一个临时表和查询语句之间没有符号。
WITH t1 AS ( SELECT * FROM carinfo ), t2 AS ( SELECT * FROM car_blacklist ) SELECT * FROM t1, t2
#!/bin/bash # 定义变量方便修改 APP=gmall hive=/opt/module/hive/bin/hive # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$1" ] ;then do_date=$1 else do_date=`date -d "-1 day" +%F` fi sql=" with tmp_order as ( select user_id, sum(oi.total_amount) order_amount, count(*) order_count from "$APP".dwd_order_info oi where date_format(oi.create_time,‘yyyy-MM-dd‘)=‘$do_date‘ group by user_id ) , tmp_payment as ( select user_id, sum(pi.total_amount) payment_amount, count(*) payment_count from "$APP".dwd_payment_info pi where date_format(pi.payment_time,‘yyyy-MM-dd‘)=‘$do_date‘ group by user_id ), tmp_comment as ( select user_id, count(*) comment_count from "$APP".dwd_comment_log c where date_format(c.dt,‘yyyy-MM-dd‘)=‘$do_date‘ group by user_id ) Insert overwrite table "$APP".dws_user_action partition(dt=‘$do_date‘) select user_actions.user_id, sum(user_actions.order_count), sum(user_actions.order_amount), sum(user_actions.payment_count), sum(user_actions.payment_amount), sum(user_actions.comment_count) from ( select user_id, order_count, order_amount, 0 payment_count, 0 payment_amount, 0 comment_count from tmp_order union all select user_id, 0, 0, payment_count, payment_amount, 0 from tmp_payment union all select user_id, 0, 0, 0, 0, comment_count from tmp_comment ) user_actions group by user_id; " $hive -e "$sql"
以上是关于数据仓库 DWS层之用户行为宽表的主要内容,如果未能解决你的问题,请参考以下文章