通过配置hive-site.xml文件实现Hive集成Spark

Posted 虎鲸不是鱼

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了通过配置hive-site.xml文件实现Hive集成Spark相关的知识,希望对你有一定的参考价值。

通过配置hive-site.xml文件实现Hive集成Spark

配置前

[root@node1 ~]# cd /export/server/spark-2.4.5-bin-hadoop2.7/
[root@node1 spark-2.4.5-bin-hadoop2.7]# ll
[root@node1 spark-2.4.5-bin-hadoop2.7]# cd bin/
[root@node1 bin]# ll
总用量 112
-rwxr-xr-x 1 user1 user1 1089 23 2020 beeline
-rw-r--r-- 1 user1 user1 1064 23 2020 beeline.cmd
-rwxr-xr-x 1 user1 user1 5440 23 2020 docker-image-tool.sh
-rwxr-xr-x 1 user1 user1 1933 23 2020 find-spark-home
-rw-r--r-- 1 user1 user1 2681 23 2020 find-spark-home.cmd
-rw-r--r-- 1 user1 user1 1892 23 2020 load-spark-env.cmd
-rw-r--r-- 1 user1 user1 2025 23 2020 load-spark-env.sh
-rwxr-xr-x 1 user1 user1 2987 23 2020 pyspark
-rw-r--r-- 1 user1 user1 1540 23 2020 pyspark2.cmd
-rw-r--r-- 1 user1 user1 1170 23 2020 pyspark.cmd
-rwxr-xr-x 1 user1 user1 1030 23 2020 run-example
-rw-r--r-- 1 user1 user1 1223 23 2020 run-example.cmd
-rwxr-xr-x 1 user1 user1 3196 23 2020 spark-class
-rw-r--r-- 1 user1 user1 2817 23 2020 spark-class2.cmd
-rw-r--r-- 1 user1 user1 1180 23 2020 spark-class.cmd
-rwxr-xr-x 1 user1 user1 1039 23 2020 sparkR
-rw-r--r-- 1 user1 user1 1097 23 2020 sparkR2.cmd
-rw-r--r-- 1 user1 user1 1168 23 2020 sparkR.cmd
-rwxr-xr-x 1 user1 user1 3122 23 2020 spark-shell
-rw-r--r-- 1 user1 user1 1818 23 2020 spark-shell2.cmd
-rw-r--r-- 1 user1 user1 1178 23 2020 spark-shell.cmd
-rwxr-xr-x 1 user1 user1 1065 23 2020 spark-sql
-rw-r--r-- 1 user1 user1 1118 23 2020 spark-sql2.cmd
-rw-r--r-- 1 user1 user1 1173 23 2020 spark-sql.cmd
-rwxr-xr-x 1 user1 user1 1040 23 2020 spark-submit
-rw-r--r-- 1 user1 user1 1155 23 2020 spark-submit2.cmd
-rw-r--r-- 1 user1 user1 1180 23 2020 spark-submit.cmd
[root@node1 bin]# spark-sql 
21/09/02 16:32:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/09/02 16:32:26 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
21/09/02 16:32:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Spark master: local[*], Application Id: local-1630571548369
spark-sql> show databases;
default
Time taken: 2.785 seconds, Fetched 1 row(s)
spark-sql> use d
data           date           date(          date_add(      date_format(   date_sub(      datediff(      datetime       
day(           dayofmonth(    decimal(       decode(        defined        degrees(       delimited      dense_rank(    
desc           describe       directory      distinct       distribute     div(           double         double(        
drop           
spark-sql> use default
         > ;
21/09/02 16:33:01 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Time taken: 0.034 seconds
spark-sql> exit;
[root@node1 bin]# 

集成方式

选用最简单的方式,直接把配置文件hive-site.xml放到Spark配置目录下即可,Spark会自动根据配置文件连接Hive的Thrift(当然得提前打开Hive的Thrift Server才能让Spark连接成功)。

hive-site.xml配置

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
   Licensed to the Apache Software Foundation (ASF) under one or more
   contributor license agreements.  See the NOTICE file distributed with
   this work for additional information regarding copyright ownership.
   The ASF licenses this file to You under the Apache License, Version 2.0
   (the "License"); you may not use this file except in compliance with
   the License.  You may obtain a copy of the License at

       http://www.apache.org/licenses/LICENSE-2.0

   Unless required by applicable law or agreed to in writing, software
   distributed under the License is distributed on an "AS IS" BASIS,
   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
   See the License for the specific language governing permissions and
   limitations under the License.
-->
<configuration>

<property>
	<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionPassword</name>
	<value>123456</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node3:3306/hivemetadata?createDatabaseIfNotExist=true&amp;useSSL=false</value>
</property>
<property>
	<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
	<name>hive.metastore.schema.verification</name>
	<value>false</value>
</property>
<property>
	<name>datanucleus.schema.autoCreateAll</name>
	<value>true</value>
</property>
<property>
	<name>hive.server2.thrift.bind.host</name>
<value>node3</value>
</property>

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://node3:9083</value>
</property>

</configuration>

集成

[root@node1 spark-2.4.5-bin-hadoop2.7]# cd /export/server/spark-2.4.5-bin-hadoop2.7/conf
[root@node1 conf]# ll
总用量 36
-rw-r--r-- 1 user1 user1  996 23 2020 docker.properties.template
-rw-r--r-- 1 user1 user1 1105 23 2020 fairscheduler.xml.template
-rw-r--r-- 1 user1 user1 2059 819 16:49 log4j.properties
-rw-r--r-- 1 user1 user1 7801 23 2020 metrics.properties.template
-rw-r--r-- 1 user1 user1  885 819 16:36 slaves
-rw-r--r-- 1 user1 user1 1502 819 17:41 spark-defaults.conf
-rwxr-xr-x 1 user1 user1 4705 819 16:42 spark-env.sh
[root@node1 conf]# rz
rz waiting to receive.
Starting zmodem transfer.  Press Ctrl+C to cancel.
Transferring hive-site.xml...
  100%       1 KB       1 KB/sec    00:00:01       0 Errors  

[root@node1 conf]# ll
总用量 40
-rw-r--r-- 1 user1 user1  996 23 2020 docker.properties.template
-rw-r--r-- 1 user1 user1 1105 23 2020 fairscheduler.xml.template
-rw-r--r-- 1 root  root  1844 52 19:52 hive-site.xml
-rw-r--r-- 1 user1 user1 2059 819 16:49 log4j.properties
-rw-r--r-- 1 user1 user1 7801 23 2020 metrics.properties.template
-rw-r--r-- 1 user1 user1  885 819 16:36 slaves
-rw-r--r-- 1 user1 user1 1502 819 17:41 spark-defaults.conf
-rwxr-xr-x 1 user1 user1 4705 819 16:42 spark-env.sh
[root@node1 conf]# cd ..
[root@node1 spark-2.4.5-bin-hadoop2.7]# ll
总用量 104
drwxr-xr-x 3 user1 user1  4096 92 16:32 bin
drwxr-xr-x 2 user1 user1   215 92 16:37 conf
drwxr-xr-x 5 user1 user1    50 23 2020 data
drwxr-xr-x 4 user1 user1    29 23 2020 examples
drwxr-xr-x 2 user1 user1 12288 23 2020 jars
drwxr-xr-x 4 user1 user1    38 23 2020 kubernetes
-rw-r--r-- 1 user1 user1 21371 23 2020 LICENSE
drwxr-xr-x 2 user1 user1  4096 23 2020 licenses
-rw-r--r-- 1 user1 user1 42919 23 2020 NOTICE
drwxr-xr-x 9 user1 user1   311 23 2020 python
drwxr-xr-x 3 user1 user1    17 23 2020 R
-rw-r--r-- 1 user1 user1  3756 23 2020 README.md
-rw-r--r-- 1 user1 user1   187 23 2020 RELEASE
drwxr-xr-x 2 user1 user1  4096 23 2020 sbin
drwxr-xr-x 2 user1 user1    42 23 2020 yarn
[root@node1 spark-2.4.5-bin-hadoop2.7]# cd bin/
[root@node1 bin]# ll
总用量 116
-rwxr-xr-x 1 user1 user1 1089 23 2020 beeline
-rw-r--r-- 1 user1 user1 1064 23 2020 beeline.cmd
-rw-r--r-- 1 root  root   724 92 16:32 derby.log
-rwxr-xr-x 1 user1 user1 5440 23 2020 docker-image-tool.sh
-rwxr-xr-x 1 user1 user1 1933 23 2020 find-spark-home
-rw-r--r-- 1 user1 user1 2681 23 2020 find-spark-home.cmd
-rw-r--r-- 1 user1 user1 1892 23 2020 load-spark-env.cmd
-rw-r--r-- 1 user1 user1 2025 23 2020 load-spark-env.sh
drwxr-xr-x 5 root  root   133 92 16:32 metastore_db
-rwxr-xr-x 1 user1 user1 2987 23 2020 pyspark
-rw-r--r-- 1 user1 user1 1540 23 2020 pyspark2.cmd
-rw-r--r-- 1 user1 user1 1170 23 2020 pyspark.cmd
-rwxr-xr-x 1 user1 user1 1030 23 2020 run-example
-rw-r--r-- 1 user1 user1 1223 23 2020 run-example.cmd
-rwxr-xr-x 1 user1 user1 3196 23 2020 spark-class
-rw-r--r-- 1 user1 user1 2817 23 2020 spark-class2.cmd
-rw-r--r-- 1 user1 user1 1180 23 2020 spark-class.cmd
-rwxr-xr-x 1 user1 user1 1039 23 2020 sparkR
-rw-r--r-- 1 user1 user1 1097 23 2020 sparkR2.cmd
-rw-r--r-- 1 user1 user1 1168 23 2020 sparkR.cmd
-rwxr-xr-x 1 user1 user1 3122 23 2020 spark-shell
-rw-r--r-- 1 user1 user1 1818 23 2020 spark-shell2.cmd
-rw-r--r-- 1 user1 user1 1178 23 2020 spark-shell.cmd
-rwxr-xr-x 1 user1 user1 1065 23 2020 spark-sql
-rw-r--r-- 1 user1 user1 1118 23 2020 spark-sql2.cmd
-rw-r--r-- 1 user1 user1 1173 23 2020 spark-sql.cmd
-rwxr-xr-x 1 user1 user1 1040 23 2020 spark-submit
-rw-r--r-- 1 user1 user1 1155 23 2020 spark-submit2.cmd
-rw-r--r-- 1 user1 user1 1180 23 2020 spark-submit.cmd
[root@node1 bin]# sp
spark-class   sparkR        spark-shell   spark-sql     spark-submit  splain        split         sprof         
[root@node1 bin]# spark-sql 
21/09/02 16:37:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark master: local[*], Application Id: local-1630571880126
spark-sql> show databases;
aaa
default
Time taken: 1.886 seconds, Fetched 2 row(s)
spark-sql> select * from aaa.test1;
1       A1
2       A2
3       A3
4       A4
5       A5
6       A6
Time taken: 1.253 seconds, Fetched 6 row(s)
spark-sql> exit;
[root@node1 bin]# 

集成后和Hive的Beeline功能差不多。

以上是关于通过配置hive-site.xml文件实现Hive集成Spark的主要内容,如果未能解决你的问题,请参考以下文章

通过配置hive-site.xml文件实现Hive集成Spark

全网最详细的hive-site.xml配置文件里如何添加达到Hive与HBase的集成,即Hive通过这些参数去连接HBase(图文详解)

Hive配置文件hive-site.xml

解决配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。

hive0.13.1 中的 hive-site.xml 路径

通过spark-sql快速读取hive中的数据