05-数据仓库之建模实例
Posted lihaozong2013
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了05-数据仓库之建模实例相关的知识,希望对你有一定的参考价值。
1、缘由
在总结了前面几篇的心得之后,朋友叫我给大家分享一下实例,能够更直观的了解到建模,于是自己动手做了简单的订单系统。
2、mysql表
假设mysql中存在下面的八张表:
3、ods
ods层和源数据最好保持一致:
//创建用户表 drop table if exists ods_user_info; create table ods_user_info( `id` string COMMENT ‘用户id‘, `name` string COMMENT ‘姓名‘, `birthday` string COMMENT ‘生日‘ , `gender` string COMMENT ‘性别‘, `email` string COMMENT ‘邮箱‘, `user_level` string COMMENT ‘用户等级‘, `create_time` string COMMENT ‘创建时间‘ ) COMMENT ‘用户信息‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_user_info/‘ tblproperties ("parquet.compression"="snappy") //创建订单表 drop table if exists ods_order_info; create table ods_order_info ( `id` string COMMENT ‘订单编号‘, `total_amount` decimal(10,2) COMMENT ‘订单金额‘, `order_status` string COMMENT ‘订单状态‘, `user_id` string COMMENT ‘用户id‘ , `payment_way` string COMMENT ‘支付方式‘, `out_trade_no` string COMMENT ‘支付流水号‘, `create_time` string COMMENT ‘创建时间‘, `operate_time` string COMMENT ‘操作时间‘ ) COMMENT ‘订单表‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_order_info/‘ tblproperties ("parquet.compression"="snappy") ; //创建订单详情表 drop table if exists ods_order_detail; create table ods_order_detail( `id` string COMMENT ‘订单编号‘, `order_id` string COMMENT ‘订单号‘, `user_id` string COMMENT ‘用户id‘ , `sku_id` string COMMENT ‘商品id‘, `sku_name` string COMMENT ‘商品名称‘, `order_price` string COMMENT ‘下单价格‘, `sku_num` string COMMENT ‘商品数量‘, `create_time` string COMMENT ‘创建时间‘ ) COMMENT ‘订单明细表‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_order_detail/‘ tblproperties ("parquet.compression"="snappy") //创建支付流水表 drop table if exists `ods_payment_info`; create table `ods_payment_info`( `id` bigint COMMENT ‘编号‘, `out_trade_no` string COMMENT ‘对外业务编号‘, `order_id` string COMMENT ‘订单编号‘, `user_id` string COMMENT ‘用户编号‘, `alipay_trade_no` string COMMENT ‘支付宝交易流水编号‘, `total_amount` decimal(16,2) COMMENT ‘支付金额‘, `subject` string COMMENT ‘交易内容‘, `payment_type` string COMMENT ‘支付类型‘, `payment_time` string COMMENT ‘支付时间‘ ) COMMENT ‘支付流水表‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_payment_info/‘ tblproperties ("parquet.compression"="snappy") ; //创建商品一级、二级、三级分类表 drop table if exists ods_base_category1; create table ods_base_category1( `id` string COMMENT ‘id‘, `name` string COMMENT ‘名称‘ ) COMMENT ‘商品一级分类‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_base_category1/‘ tblproperties ("parquet.compression"="snappy") ; drop table if exists ods_base_category2; create external table ods_base_category2( `id` string COMMENT ‘ id‘, `name` string COMMENT ‘名称‘, category1_id string COMMENT ‘一级品类id‘ ) COMMENT ‘商品二级分类‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_base_category2/‘ tblproperties ("parquet.compression"="snappy") ; drop table if exists ods_base_category3; create table ods_base_category3( `id` string COMMENT ‘ id‘, `name` string COMMENT ‘名称‘, category2_id string COMMENT ‘二级品类id‘ ) COMMENT ‘商品三级分类‘ PARTITIONED BY ( `dt` string) row format delimited fields terminated by ‘ ‘ location ‘/warehouse/gmall/ods/ods_base_category3/‘ tblproperties ("parquet.compression"="snappy") ;
4、dwd
①对ODS层数据进行判空过滤。
②更改压缩算法和文件存储格式
③对商品分类表进行维度退化(降维)
//创建用户表 drop table if exists dwd_user_info; create external table dwd_user_info( `id` string COMMENT ‘id‘, `name` string COMMENT ‘‘, `birthday` string COMMENT ‘‘ , `gender` string COMMENT ‘‘, `email` string COMMENT ‘‘, `user_level` string COMMENT ‘‘, `create_time` string COMMENT ‘‘ ) COMMENT ‘‘ PARTITIONED BY ( `dt` string) stored as parquet location ‘/warehouse/gmall/dwd/dwd_user_info/‘ tblproperties ("parquet.compression"="snappy") //创建订单表 drop table if exists dwd_order_info; create external table dwd_order_info ( `id` string COMMENT ‘‘, `total_amount` decimal(10,2) COMMENT ‘‘, `order_status` string COMMENT ‘ 1 2 3 4 5‘, `user_id` string COMMENT ‘id‘ , `payment_way` string COMMENT ‘‘, `out_trade_no` string COMMENT ‘‘, `create_time` string COMMENT ‘‘, `operate_time` string COMMENT ‘‘ ) COMMENT ‘‘ PARTITIONED BY ( `dt` string) stored as parquet location ‘/warehouse/gmall/dwd/dwd_order_info/‘ tblproperties ("parquet.compression"="snappy") ; //创建订单详情表 drop table if exists dwd_order_detail; create external table dwd_order_detail( `id` string COMMENT ‘‘, `order_id` decimal(10,2) COMMENT ‘‘, `user_id` string COMMENT ‘id‘ , `sku_id` string COMMENT ‘id‘, `sku_name` string COMMENT ‘‘, `order_price` string COMMENT ‘‘, `sku_num` string COMMENT ‘‘, `create_time` string COMMENT ‘‘ ) COMMENT ‘‘ PARTITIONED BY ( `dt` string) stored as parquet location ‘/warehouse/gmall/dwd/dwd_order_detail/‘ tblproperties ("parquet.compression"="snappy") ; //创建支付流水表 drop table if exists `dwd_payment_info`; create external table `dwd_payment_info`( `id` bigint COMMENT ‘‘, `out_trade_no` string COMMENT ‘‘, `order_id` string COMMENT ‘‘, `user_id` string COMMENT ‘‘, `alipay_trade_no` string COMMENT ‘‘, `total_amount` decimal(16,2) COMMENT ‘‘, `subject` string COMMENT ‘‘, `payment_type` string COMMENT ‘‘, `payment_time` string COMMENT ‘‘ ) COMMENT ‘‘ PARTITIONED BY ( `dt` string) stored as parquet location ‘/warehouse/gmall/dwd/dwd_payment_info/‘ tblproperties ("parquet.compression"="snappy") //创建商品表 drop table if exists dwd_sku_info; create external table dwd_sku_info( `id` string COMMENT ‘skuId‘, `spu_id` string COMMENT ‘spuid‘, `price` decimal(10,2) COMMENT ‘‘ , `sku_name` string COMMENT ‘‘, `sku_desc` string COMMENT ‘‘, `weight` string COMMENT ‘‘, `tm_id` string COMMENT ‘id‘, `category3_id` string COMMENT ‘1id‘, `category2_id` string COMMENT ‘2id‘, `category1_id` string COMMENT ‘3id‘, `category3_name` string COMMENT ‘3‘, `category2_name` string COMMENT ‘2‘, `category1_name` string COMMENT ‘1‘, `create_time` string COMMENT ‘‘ ) COMMENT ‘‘ PARTITIONED BY ( `dt` string) stored as parquet location ‘/warehouse/gmall/dwd/dwd_sku_info/‘ tblproperties ("parquet.compression"="snappy")
注:从建表语句中可以看出,dwd在ods上进行了维度合并,如下图
以上是关于05-数据仓库之建模实例的主要内容,如果未能解决你的问题,请参考以下文章