spark sql 查询hive表并写入到PG中

Posted qe

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了spark sql 查询hive表并写入到PG中相关的知识,希望对你有一定的参考价值。

import java.sql.DriverManager
import java.util.Properties

import com.zhaopin.tools.{DateUtils, TextUtils}
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.SparkSession

/**
  * Created by xiaoyan on 2018/5/21.
  */
object IhrDownloadPg {
  def main(args: Array[String]){
    //设置spark日志级别
    Logger.getLogger("org.apache.spark").setLevel(Level.ERROR)
    System.setProperty("HADOOP_USER_NAME","hive")
    val spark = SparkSession
      .builder()
      .master("local[*]")
      .appName("hive ->> ihr_oper_download")
      .config("spark.sql.warehouse.dir", "spark-warehouse")
      .config("hive.metastore.uris", "thrift://master:9083")
      .enableHiveSupport()
      .getOrCreate()
    import spark.sql

    val dt = if(!args.isEmpty) args(0) else "20180506"
    val yesterday = DateUtils.dateAdd(dt, -1)

    val url = "jdbc:postgresql://192.168.9.222:5432/safe_base"
    Class.forName("org.postgresql.Driver")
    val conn = DriverManager.getConnection(url,"secu_man","secu_man")
    val stmt = conn.createStatement()
    stmt.execute("delete from ihr_oper_download where dt = ‘" + yesterday+"‘")

    //查询RDD
    val re1 = sql("select oper_date, " +
      "       acct_id, " +
      "       acct_name, " +
      "       module_name, " +
      "       oper_desc, " +
      "       ip, " +
      "       dt"  +
      " from safe.fact_ihr_oper_download t " +
      " where t.dt > ‘20180320‘ and t.dt <"+yesterday+"");

    val connectionProperties = new Properties()
    //增加数据库的用户名(user)密码(password),指定postgresql驱动(driver)
    connectionProperties.put("user", "secu_man");
    connectionProperties.put("password", "secu_man");
    connectionProperties.put("driver", "org.postgresql.Driver");
    re1.toDF().write.mode("append").jdbc(url, "ihr_oper_download", connectionProperties);
    System.err.print("ihr_oper_download insert complete!! ");
  }
}

  注意:如果PG表不存在,默认会自动创建一张表,且字段类型为text

以上是关于spark sql 查询hive表并写入到PG中的主要内容,如果未能解决你的问题,请参考以下文章

如何将数据写入 Hive 表?

在 Spark SQL 中找不到 Hive 表 - Cloudera VM 中的 spark.sql.AnalysisException

数据湖(十八):Flink与Iceberg整合SQL API操作

将数据写入 Hive Spark SQL 时出现 ArrayIndexOutOfBoundsException 异常

将转换从 hive sql 查询转移到 Spark

如何在写入hive orc表时合并spark中的小文件