HIVE消费者画像

Posted 小基基o_O

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了HIVE消费者画像相关的知识,希望对你有一定的参考价值。

文章目录

概述

  • 消费者画像,是以消费者ID(通常是用户ID)作唯一标识,统计消费者的各项指标
  • 通常业务系统数据库没有专门存储消费者的表,只有用户信息表
    用户注册后,并不一定会消费,消费的用户占比可能很小
  • 消费者画像的构建 需要借助 用户维度表 和 子订单明细表
常见指标
累计金额、近期金额(近1天、7天、30天金额)
累计订单数、近期订单数
累计商品数、近期商品数
最近1次消费时间、最早1次消费时间
最近1次消费地址
消费地区(1~n个)
曾购品牌(类目)
曾购类目(列表)
购买间隔
平均每次金额=累计金额/累计订单数
……

SQL

按用户ID作为唯一标识 合并 dwd每日订单 dws每日消费者 dwt累积消费者

最近1次消费的收货地址 的计算使用 窗口函数
如果 dwt累积消费者 每天都用 dws每日消费者 来合并,计算量就会很大
改进:昨天分区dws消费者 full out join 前天分区dwt累计消费者

开窗

ROW_NUMBER

WITH
-- 原始数据
t AS (
  SELECT 'u1' AS uid,'佛山' AS address,'2022-01-01' AS order_date UNION ALL
  SELECT 'u2' AS uid,'深圳' AS address,'2022-01-02' AS order_date UNION ALL
  SELECT 'u2' AS uid,'广州' AS address,'2022-01-03' AS order_date UNION ALL
  SELECT 'u3' AS uid,'佛山' AS address,'2022-01-04' AS order_date
),
-- 开窗:按用户ID分区,分区内按日期降序
t1 AS (
  SELECT
    uid
    ,address
    ,order_date
    ,ROW_NUMBER() OVER(PARTITION BY uid ORDER BY order_date desc) AS a_row_number
  FROM t
)
-- 过滤
SELECT
  uid
  ,address AS last_address
  ,order_date AS last_order_date
FROM t1
WHERE a_row_number=1;

全外联

FULL OUTER JOIN

WITH
t1 AS (
  SELECT 'u1' AS uid,10 AS amount UNION ALL
  SELECT 'u2' AS uid,20 AS amount
),
t2 AS (
  SELECT 'u2' AS uid,30 AS amount UNION ALL
  SELECT 'u3' AS uid,40 AS amount
)
SELECT
  NVL(t1.uid,t2.uid) AS uid,
  NVL(t1.amount,0)+NVL(t2.amount,0) AS amount
FROM t1
FULL OUTER JOIN t2 ON t1.uid=t2.uid;

最近7天

WITH a AS (
  SELECT '2022-08-01' AS y UNION ALL
  SELECT '2022-08-02' AS y UNION ALL
  SELECT '2022-08-03' AS y UNION ALL
  SELECT '2022-08-04' AS y UNION ALL
  SELECT '2022-08-05' AS y UNION ALL
  SELECT '2022-08-06' AS y UNION ALL
  SELECT '2022-08-07' AS y UNION ALL
  SELECT '2022-08-08' AS y UNION ALL
  SELECT '2022-08-09' AS y UNION ALL
  SELECT '2022-08-10' AS y UNION ALL
  SELECT '2022-08-11' AS y UNION ALL
  SELECT '2022-08-12' AS y
)
SELECT y FROM a
WHERE y>DATE_SUB('2022-08-10',7) AND y<='2022-08-10'
ORDER BY y;
numy
12022-08-04
22022-08-05
32022-08-06
42022-08-07
52022-08-08
62022-08-09
72022-08-10
WITH a AS (
  SELECT '2022-08-01' AS y UNION ALL
  SELECT '2022-08-02' AS y UNION ALL
  SELECT '2022-08-03' AS y UNION ALL
  SELECT '2022-08-04' AS y UNION ALL
  SELECT '2022-08-05' AS y UNION ALL
  SELECT '2022-08-06' AS y UNION ALL
  SELECT '2022-08-07' AS y UNION ALL
  SELECT '2022-08-08' AS y UNION ALL
  SELECT '2022-08-09' AS y UNION ALL
  SELECT '2022-08-10' AS y UNION ALL
  SELECT '2022-08-11' AS y
)
SELECT
  COUNT(IF(y='2022-08-10',y,NULL))             AS `近1天计数`,
  COUNT(IF(y>DATE_SUB('2022-08-10',3),y,NULL)) AS `近3天计数`,
  COUNT(y)                                     AS `近7天计数`
FROM a
WHERE y>DATE_SUB('2022-08-10',7) AND y<='2022-08-10';

-- 按今天算,最近7天
WHERE y>DATE_SUB(CURRENT_DATE(),7) AND y<=CURRENT_DATE()
-- 按昨天算,最近7天
WHERE y>=DATE_SUB(CURRENT_DATE(),7) AND y<=DATE_SUB(CURRENT_DATE(),1)
-- 脚本中
SELECT
  COUNT(IF(y="ymd",y,NULL)),
  COUNT(IF(y>DATE_SUB("ymd",7),y,NULL)),
  COUNT(y)
FROM a
WHERE y>DATE_SUB("ymd",30) AND y<="ymd";

少吃零食多看书

以上是关于HIVE消费者画像的主要内容,如果未能解决你的问题,请参考以下文章

HIVE消费者画像

SparkSQL电商用户画像之用户画像开发(客户消费订单表)

用户画像

重点:用户画像

征信画像项目实施文档摘要

Spark+ES+ClickHouse 构建DMP用户画像