spark使用insertInto存入hive分区表中
Posted ZL小屁孩
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark使用insertInto存入hive分区表中相关的知识,希望对你有一定的参考价值。
把spark的处理结果存入hive分区表中,可以直接在sql中设定分区即可,可以使用withColumn算子执行
ss.sql("SELECT merchant_id,platform," +
"case when trim(first_channel_id) = '' or first_channel_id is null then '-1' else first_channel_id end as channel_id," +
"is_new," +
"0 as language_id," +
"'all' as country_code," +
"count(1) as pv," +
"sum(case when page_type = 'Productpage' then 1 else 0 end) as ppv," +
"count(distinct cookie) as uv," +
"count(distinct case when is_bounce = 1 then cookie end) as bounce_uv," +
"count(distinct case when page_type = 'Shoppingcart' then cookie end) as shoppingcart_uv," +
"null as application_type_id," +
s"count(case when hour = '$per_hour' then 1 end) as inc_pv," +
s"sum(case when hour = '$per_hour' and page_type = 'Productpage' then 1 else 0 end) as inc_ppv," +
s"count(distinct case when hour = '$per_hour' then cookie end) as inc_uv," +
s"count(distinct case when hour = '$per_hour' and is_bounce = 1 then cookie end) as inc_bounce_uv," +
"count(distinct case when page_type = 'Productpage' or page_type = 'Categorypage' then cookie end) as product_category_uv " +
"FROM tmp_traff " +
"WHERE first_channel_id rlike '^\\\\\\\\d+$' " +
"GROUP BY merchant_id,platform," +
"case when trim(first_channel_id) = '' or first_channel_id is null then '-1' else first_channel_id end," +
"is_new")
//dt、hour和merchant是分区字段
.withColumn("dt", lit(s"$dt")).withColumn("hour", lit(s"$per_hour")).withColumn("merchant", lit(s"$merchant"))
.repartition(1)
//直接使用SaveMode实现即可
.write.mode(SaveMode.Overwrite).format("hive").insertInto("table_name")
以上是关于spark使用insertInto存入hive分区表中的主要内容,如果未能解决你的问题,请参考以下文章
在分区 hive 表中插入 spark Dataframe 而不会覆盖数据
Hive:Spark中如何实现将rdd结果插入到hive1.3.1表中