Spark连接Hive

Posted Shall潇

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark连接Hive相关的知识,希望对你有一定的参考价值。

一、配置hive-site.xml

将hive/conf/hive-site.xml文件拷贝到spark/conf 下
修改hive/conf/hive-site.xml

<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/opt/soft/hive/warehouse</value>
</property>

<!--<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>-->

<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://hadoop100:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
<property>
    <name>hive.cli.print.header</name>
    <value>true</value>
</property>
<property>
    <name>hive.cli.print.current.db</name>
    <value>true</value>
</property>
<property>
<name>hive.exec.mode.local.auto</name>
<value>true</value>
</property>
<property>
  <name>hive.server2.thrift.client.user</name>
  <value>root</value>
  <description>Username to use against thrift client</description>
</property>
<property>
  <name>hive.server2.thrift.client.password</name>
  <value>root</value>
  <description>Password to use against thrift client</description>
</property>
<property>
 <name>hive.metastore.uris</name>
 <value>thrift://192.168.XXX.100:9083</value>
</property>
</configuration>

二、启动服务

  • 启动 hive 元数据服务
  • 启动 hiveserver2 服务
nohup /opt/soft/hive/bin/hive --service metastore &
nohup /opt/soft/hive/bin/hive --service hiveserver2 &

进入spark

spark-shell
spark.table("库名.表名").show  //查看表内容

三、idea如何实现连接

1、添加依赖

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.11</artifactId>
      <version>2.1.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.11</artifactId>
      <version>2.1.1</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-hive_2.11</artifactId>
      <version>2.1.1</version>
    </dependency>
  </dependencies>

2、编写程序

package Hive

import org.apache.spark.sql.SparkSession

object SparkToHive {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession.builder().appName("sparkToHive").master("local[*]")
      .config("hive.metastore.uris","thrift://192.168.159.100:9083")
      .enableHiveSupport()
      .getOrCreate()

    spark.sql("show databases").collect.foreach(println)
//    val df = spark.sql("select * from emp.emp_basic")
//    df.show()
  }
}

以上是关于Spark连接Hive的主要内容,如果未能解决你的问题,请参考以下文章

spark连接hive

Spark之HiveSupport连接(spark-shell和IDEA)

Spark 上的 Hive 不返回聚合或连接查询的结果

本地Spark连接远程集群Hive(Scala/Python)

Spark 连接hive,启动spark-shell报错:Error creating transactional connection factory

通过 Spark 的 Hive JDBC 连接(Nullpointer 异常)