HIVE消费者画像
Posted 小基基o_O
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HIVE消费者画像相关的知识,希望对你有一定的参考价值。
文章目录
概述
- 消费者画像,是以消费者ID(通常是用户ID)作唯一标识,统计消费者的各项指标
- 通常业务系统数据库没有专门存储消费者的表,只有用户信息表
用户注册后,并不一定会消费,消费的用户占比可能很小 - 消费者画像的构建 需要借助 用户维度表 和 子订单明细表
-
常见指标
-
累计金额、近期金额(近1天、7天、30天金额)
累计订单数、近期订单数
累计商品数、近期商品数
最近1次消费时间、最早1次消费时间
最近1次消费地址
消费地区(1~n个)
曾购品牌(类目)
曾购类目(列表)
购买间隔
平均每次金额=累计金额/累计订单数
……
SQL
最近1次消费的收货地址 的计算使用 窗口函数
如果 dwt累积消费者 每天都用 dws每日消费者 来合并,计算量就会很大
改进:昨天分区dws消费者 full out join 前天分区dwt累计消费者
开窗
ROW_NUMBER
WITH
-- 原始数据
t AS (
SELECT 'u1' AS uid,'佛山' AS address,'2022-01-01' AS order_date UNION ALL
SELECT 'u2' AS uid,'深圳' AS address,'2022-01-02' AS order_date UNION ALL
SELECT 'u2' AS uid,'广州' AS address,'2022-01-03' AS order_date UNION ALL
SELECT 'u3' AS uid,'佛山' AS address,'2022-01-04' AS order_date
),
-- 开窗:按用户ID分区,分区内按日期降序
t1 AS (
SELECT
uid
,address
,order_date
,ROW_NUMBER() OVER(PARTITION BY uid ORDER BY order_date desc) AS a_row_number
FROM t
)
-- 过滤
SELECT
uid
,address AS last_address
,order_date AS last_order_date
FROM t1
WHERE a_row_number=1;
全外联
FULL OUTER JOIN
WITH
t1 AS (
SELECT 'u1' AS uid,10 AS amount UNION ALL
SELECT 'u2' AS uid,20 AS amount
),
t2 AS (
SELECT 'u2' AS uid,30 AS amount UNION ALL
SELECT 'u3' AS uid,40 AS amount
)
SELECT
NVL(t1.uid,t2.uid) AS uid,
NVL(t1.amount,0)+NVL(t2.amount,0) AS amount
FROM t1
FULL OUTER JOIN t2 ON t1.uid=t2.uid;
最近7天
WITH a AS (
SELECT '2022-08-01' AS y UNION ALL
SELECT '2022-08-02' AS y UNION ALL
SELECT '2022-08-03' AS y UNION ALL
SELECT '2022-08-04' AS y UNION ALL
SELECT '2022-08-05' AS y UNION ALL
SELECT '2022-08-06' AS y UNION ALL
SELECT '2022-08-07' AS y UNION ALL
SELECT '2022-08-08' AS y UNION ALL
SELECT '2022-08-09' AS y UNION ALL
SELECT '2022-08-10' AS y UNION ALL
SELECT '2022-08-11' AS y UNION ALL
SELECT '2022-08-12' AS y
)
SELECT y FROM a
WHERE y>DATE_SUB('2022-08-10',7) AND y<='2022-08-10'
ORDER BY y;
num | y |
---|---|
1 | 2022-08-04 |
2 | 2022-08-05 |
3 | 2022-08-06 |
4 | 2022-08-07 |
5 | 2022-08-08 |
6 | 2022-08-09 |
7 | 2022-08-10 |
WITH a AS (
SELECT '2022-08-01' AS y UNION ALL
SELECT '2022-08-02' AS y UNION ALL
SELECT '2022-08-03' AS y UNION ALL
SELECT '2022-08-04' AS y UNION ALL
SELECT '2022-08-05' AS y UNION ALL
SELECT '2022-08-06' AS y UNION ALL
SELECT '2022-08-07' AS y UNION ALL
SELECT '2022-08-08' AS y UNION ALL
SELECT '2022-08-09' AS y UNION ALL
SELECT '2022-08-10' AS y UNION ALL
SELECT '2022-08-11' AS y
)
SELECT
COUNT(IF(y='2022-08-10',y,NULL)) AS `近1天计数`,
COUNT(IF(y>DATE_SUB('2022-08-10',3),y,NULL)) AS `近3天计数`,
COUNT(y) AS `近7天计数`
FROM a
WHERE y>DATE_SUB('2022-08-10',7) AND y<='2022-08-10';
-- 按今天算,最近7天
WHERE y>DATE_SUB(CURRENT_DATE(),7) AND y<=CURRENT_DATE()
-- 按昨天算,最近7天
WHERE y>=DATE_SUB(CURRENT_DATE(),7) AND y<=DATE_SUB(CURRENT_DATE(),1)
-- 脚本中
SELECT
COUNT(IF(y="ymd",y,NULL)),
COUNT(IF(y>DATE_SUB("ymd",7),y,NULL)),
COUNT(y)
FROM a
WHERE y>DATE_SUB("ymd",30) AND y<="ymd";
少吃零食多看书
以上是关于HIVE消费者画像的主要内容,如果未能解决你的问题,请参考以下文章