Hive 分区表初始化历史分区操作

Posted shujuxiong

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Hive 分区表初始化历史分区操作相关的知识,希望对你有一定的参考价值。

在新建一张分区表或者对老分区表更改表结构后希望能保留老的分区的数据,因此就需要对新建的分区表进行初始化重刷历史分区数据操作。

 

一、初始化刷新方法1

事实表和维表均取最新分区数据,以事实表的业务动作事实发生日期作为历史分区的分区字段值。参考下面这段hive脚本

技术分享图片
  1 insert overwrite table edw_agents.adm_xf_edw_agents_performance_daily_report_edw_001_di_new partition(dt)
  2 select
  3      t1.statistic_date as statistic_date
  4     ,t1.xf_agent_id                                                        as xf_agent_id
  5     ,nvl(t3.agent_true_name,?)                                           as xf_agent_true_name
  6     ,nvl(t3.xf_agent_cellphone_text,?)                                   as xf_agent_cellphone_text
  7     ,nvl(t1.xf_agent_office_id,0)                                          as xf_agent_office_id
  8     ,nvl(t2.xf_agent_office_name,?)                                      as xf_agent_office_name
  9     ,nvl(t2.xf_agent_office_broker_user_id,0)                              as xf_agent_office_broker_user_id
 10     ,nvl(t2.xf_agent_office_broker_user_name,?)                          as xf_agent_office_broker_user_name
 11     ,nvl(t2.xf_agent_office_organization_id,0)                             as xf_agent_office_organization_id
 12     ,nvl(t2.xf_agent_office_organization_short_name,?)                   as xf_agent_office_organization_short_name
 13     ,nvl(t1.xf_trade_order_record_count_number,0)                          as xf_trade_order_record_count_number
 14     ,nvl(t1.xf_trade_order_guide_count_number,0)                           as xf_trade_order_guide_count_number
 15     ,nvl(t1.xf_trade_order_appointment_net_increase_count_number,0)        as xf_trade_order_appointment_net_increase_count_number
 16     ,nvl(t1.xf_trade_order_deal_net_increase_count_number,0)               as xf_trade_order_deal_net_increase_count_number
 17     ,nvl(t1.xf_trade_order_received_total_amount,0)                        as xf_trade_order_received_total_amount
 18     ,nvl(t1.xf_trade_order_gross_merchandise_volume_total_amount,0)        as xf_trade_order_gross_merchandise_volume_total_amount
 19     ,nvl(t1.xf_trade_order_payable_commission_total_amount,0)              as xf_trade_order_payable_commission_total_amount
 20     ,${wf:id()} as load_job_number
 21     ,${wf:name()} as load_job_name
 22     ,current_timestamp as insert_timestamp
 23     ,2 as source_system_code
 24     ,regexp_replace(t1.statistic_date,-,‘‘) as dt     -- 分区字段
 25 from
 26 (
 27    select
 28       agent_id xf_agent_id
 29      ,statistic_date
 30      ,max(agent_office_id) xf_agent_office_id
 31      ,max(agent_office_organization_id) xf_agent_office_organization_id
 32      ,sum(record) xf_trade_order_record_count_number
 33      ,sum(guide) xf_trade_order_guide_count_number
 34      ,sum(pre_netorder) xf_trade_order_appointment_net_increase_count_number
 35      ,sum(bargain_netorder) xf_trade_order_deal_net_increase_count_number
 36      ,sum(totalincome) xf_trade_order_received_total_amount
 37      ,sum(GMV) xf_trade_order_gross_merchandise_volume_total_amount
 38      ,sum(shouldagent) xf_trade_order_payable_commission_total_amount
 39 
 40      from
 41     (
 42       -- 报备带看事实
 43       select
 44           agent_id
 45          ,agent_business_date as statistic_date
 46          ,max(agent_office_id) agent_office_id
 47          ,max(agent_office_organization_id) agent_office_organization_id
 48          ,count(distinct case when agent_business_code=1 then agent_business_id end ) record
 49          ,count(distinct case when agent_business_code=2 then agent_business_id end ) guide
 50          ,0 pre_netorder
 51          ,0 bargain_netorder
 52          ,0 totalincome
 53          ,0 GMV
 54          ,0 shouldagent
 55     from
 56               edw_agents.dws_xf_edw_agent_business_df
 57           where
 58               dt = ${dt}
 59             and agent_business_date between 2017-01-01 and 2018-04-11
 60           group by agent_id,agent_business_date
 61     union all
 62     -- 认购成交单事实
 63     select
 64           xf_trade_order_belong_agent_id agent_id
 65          ,statistics_date as statistic_date
 66          ,max(xf_trade_order_belong_agent_office_id) agent_office_id
 67          ,max(xf_trade_order_house_project_organization_id) agent_office_organization_id
 68          ,0 record
 69          ,0 guide
 70          ,sum(xf_trade_order_appointment_net_increase_count_number) pre_netorder
 71          ,sum(xf_trade_order_deal_net_increase_count_number) bargain_netorder
 72          ,sum(xf_trade_order_received_amount) totalincome
 73          ,sum(xf_trade_order_gross_merchandise_volume_amount) GMV
 74          ,sum(xf_trade_order_payable_commission_amount) shouldagent
 75     from
 76               edw_trade.adm_xf_edw_trade_order_report_daac_001_df
 77           where
 78               dt = ${dt}
 79             and statistics_date between 2017-01-01 and 2018-04-11
 80           group by xf_trade_order_belong_agent_id,statistics_date
 81       ) t11
 82        group by t11.agent_id,t11.statistic_date
 83 
 84 ) t1
 85 left join
 86 (-- 门店、城市分公司、经服
 87 
 88   select
 89     t21.agent_office_id                    xf_agent_office_id,
 90     t21.agent_office_name                  xf_agent_office_name,
 91     t21.agent_office_organization_id       xf_agent_office_organization_id,
 92     t21.agent_office_service_user_id       xf_agent_office_broker_user_id,
 93     t22.organization_short_name            xf_agent_office_organization_short_name,
 94     t23.username                           xf_agent_office_broker_user_name
 95 
 96    from
 97 
 98    (-- 门店维表
 99      select
100      agent_office_id,
101      agent_office_name,
102      agent_office_organization_id,
103      agent_office_service_user_id
104  from
105      edw_public.dim_edw_pub_agent_office_base_info
106  where
107      dt = ${dt}
108      and agent_office_type_code=1
109    )t21
110    left join
111    (-- 城市分公司维表
112      select
113         organization_id,
114         organization_short_name
115       from
116         edw_public.dim_xf_edw_pub_organization_base_info
117      where
118          dt = ${dt}
119    )t22
120    on (t21.agent_office_organization_id = t22.organization_id)
121    left join
122    (-- 经服维表
123     select 
124           user_uc_id as user_id,
125           max(user_real_name) as username
126     from edw_public.dim_edw_pub_internal_staff_base_info 
127     where dt=${dt}
128     group by user_uc_id
129    )t23
130    on (t21.agent_office_service_user_id = t23.user_id)
131 ) t2
132 on (t1.xf_agent_office_id=t2.xf_agent_office_id)
133 left join
134 (-- 经纪人维表
135 
136   select  agent_id,
137           agent_true_name,
138           agent_phone_number_text as xf_agent_cellphone_text
139   from
140    edw_public.dim_edw_pub_agent_base_info
141   where dt = ${dt}
142 ) t3
143 on (t1.xf_agent_id=t3.agent_id)
144 ;
HQL脚本

 

以上是关于Hive 分区表初始化历史分区操作的主要内容,如果未能解决你的问题,请参考以下文章

Hive表的动态分区和静态分区

hive归档分区

Hive的分区操作~~~~~~

Hive的分区操作

如何处理 hive 分区以提高性能与过度分区

Hive管理表分区的创建,数据导入,分区的删除操作