Spark SQL写入Hive,同分区overwrite,不同分区insert
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark SQL写入Hive,同分区overwrite,不同分区insert相关的知识,希望对你有一定的参考价值。
参考技术A 摘要: Spark SQL , Hive新建hive表,定义好字段类型和 分区字段
将DataFrame创建为视图表,创建一个分区字符串对象,使用 insert overwrite 指定 partition(dt=????) 进行指定分区的overwrite操作
建表之后test为一张空表,分别指定三次插入overwrite操作,dt分别为"20201203","20201203","20201208",最终结果只有两个分区的数据
spark sql 查询hive表并写入到PG中
import java.sql.DriverManager import java.util.Properties import com.zhaopin.tools.{DateUtils, TextUtils} import org.apache.log4j.{Level, Logger} import org.apache.spark.sql.SparkSession /** * Created by xiaoyan on 2018/5/21. */ object IhrDownloadPg { def main(args: Array[String]){ //设置spark日志级别 Logger.getLogger("org.apache.spark").setLevel(Level.ERROR) System.setProperty("HADOOP_USER_NAME","hive") val spark = SparkSession .builder() .master("local[*]") .appName("hive ->> ihr_oper_download") .config("spark.sql.warehouse.dir", "spark-warehouse") .config("hive.metastore.uris", "thrift://master:9083") .enableHiveSupport() .getOrCreate() import spark.sql val dt = if(!args.isEmpty) args(0) else "20180506" val yesterday = DateUtils.dateAdd(dt, -1) val url = "jdbc:postgresql://192.168.9.222:5432/safe_base" Class.forName("org.postgresql.Driver") val conn = DriverManager.getConnection(url,"secu_man","secu_man") val stmt = conn.createStatement() stmt.execute("delete from ihr_oper_download where dt = ‘" + yesterday+"‘") //查询RDD val re1 = sql("select oper_date, " + " acct_id, " + " acct_name, " + " module_name, " + " oper_desc, " + " ip, " + " dt" + " from safe.fact_ihr_oper_download t " + " where t.dt > ‘20180320‘ and t.dt <"+yesterday+""); val connectionProperties = new Properties() //增加数据库的用户名(user)密码(password),指定postgresql驱动(driver) connectionProperties.put("user", "secu_man"); connectionProperties.put("password", "secu_man"); connectionProperties.put("driver", "org.postgresql.Driver"); re1.toDF().write.mode("append").jdbc(url, "ihr_oper_download", connectionProperties); System.err.print("ihr_oper_download insert complete!! "); } }
注意:如果PG表不存在,默认会自动创建一张表,且字段类型为text
以上是关于Spark SQL写入Hive,同分区overwrite,不同分区insert的主要内容,如果未能解决你的问题,请参考以下文章
从 spark(2.11) 数据帧写入 hive 分区表时出现 org.apache.hadoop.hive.ql.metadata.Hive.loadDynamicPartitions 异常