通过配置hive-site.xml文件实现Hive集成Spark
Posted 杀智勇双全杀
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了通过配置hive-site.xml文件实现Hive集成Spark相关的知识,希望对你有一定的参考价值。
通过配置hive-site.xml文件实现Hive集成Spark
配置前
[root@node1 ~]# cd /export/server/spark-2.4.5-bin-hadoop2.7/
[root@node1 spark-2.4.5-bin-hadoop2.7]# ll
[root@node1 spark-2.4.5-bin-hadoop2.7]# cd bin/
[root@node1 bin]# ll
总用量 112
-rwxr-xr-x 1 user1 user1 1089 2月 3 2020 beeline
-rw-r--r-- 1 user1 user1 1064 2月 3 2020 beeline.cmd
-rwxr-xr-x 1 user1 user1 5440 2月 3 2020 docker-image-tool.sh
-rwxr-xr-x 1 user1 user1 1933 2月 3 2020 find-spark-home
-rw-r--r-- 1 user1 user1 2681 2月 3 2020 find-spark-home.cmd
-rw-r--r-- 1 user1 user1 1892 2月 3 2020 load-spark-env.cmd
-rw-r--r-- 1 user1 user1 2025 2月 3 2020 load-spark-env.sh
-rwxr-xr-x 1 user1 user1 2987 2月 3 2020 pyspark
-rw-r--r-- 1 user1 user1 1540 2月 3 2020 pyspark2.cmd
-rw-r--r-- 1 user1 user1 1170 2月 3 2020 pyspark.cmd
-rwxr-xr-x 1 user1 user1 1030 2月 3 2020 run-example
-rw-r--r-- 1 user1 user1 1223 2月 3 2020 run-example.cmd
-rwxr-xr-x 1 user1 user1 3196 2月 3 2020 spark-class
-rw-r--r-- 1 user1 user1 2817 2月 3 2020 spark-class2.cmd
-rw-r--r-- 1 user1 user1 1180 2月 3 2020 spark-class.cmd
-rwxr-xr-x 1 user1 user1 1039 2月 3 2020 sparkR
-rw-r--r-- 1 user1 user1 1097 2月 3 2020 sparkR2.cmd
-rw-r--r-- 1 user1 user1 1168 2月 3 2020 sparkR.cmd
-rwxr-xr-x 1 user1 user1 3122 2月 3 2020 spark-shell
-rw-r--r-- 1 user1 user1 1818 2月 3 2020 spark-shell2.cmd
-rw-r--r-- 1 user1 user1 1178 2月 3 2020 spark-shell.cmd
-rwxr-xr-x 1 user1 user1 1065 2月 3 2020 spark-sql
-rw-r--r-- 1 user1 user1 1118 2月 3 2020 spark-sql2.cmd
-rw-r--r-- 1 user1 user1 1173 2月 3 2020 spark-sql.cmd
-rwxr-xr-x 1 user1 user1 1040 2月 3 2020 spark-submit
-rw-r--r-- 1 user1 user1 1155 2月 3 2020 spark-submit2.cmd
-rw-r--r-- 1 user1 user1 1180 2月 3 2020 spark-submit.cmd
[root@node1 bin]# spark-sql
21/09/02 16:32:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/09/02 16:32:26 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
21/09/02 16:32:26 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
Spark master: local[*], Application Id: local-1630571548369
spark-sql> show databases;
default
Time taken: 2.785 seconds, Fetched 1 row(s)
spark-sql> use d
data date date( date_add( date_format( date_sub( datediff( datetime
day( dayofmonth( decimal( decode( defined degrees( delimited dense_rank(
desc describe directory distinct distribute div( double double(
drop
spark-sql> use default
> ;
21/09/02 16:33:01 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Time taken: 0.034 seconds
spark-sql> exit;
[root@node1 bin]#
集成方式
选用最简单的方式,直接把配置文件hive-site.xml
放到Spark配置目录下即可,Spark会自动根据配置文件连接Hive的Thrift
(当然得提前打开Hive的Thrift Server才能让Spark连接成功)。
hive-site.xml配置
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<configuration>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://node3:3306/hivemetadata?createDatabaseIfNotExist=true&useSSL=false</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<name>datanucleus.schema.autoCreateAll</name>
<value>true</value>
</property>
<property>
<name>hive.server2.thrift.bind.host</name>
<value>node3</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://node3:9083</value>
</property>
</configuration>
集成
[root@node1 spark-2.4.5-bin-hadoop2.7]# cd /export/server/spark-2.4.5-bin-hadoop2.7/conf
[root@node1 conf]# ll
总用量 36
-rw-r--r-- 1 user1 user1 996 2月 3 2020 docker.properties.template
-rw-r--r-- 1 user1 user1 1105 2月 3 2020 fairscheduler.xml.template
-rw-r--r-- 1 user1 user1 2059 8月 19 16:49 log4j.properties
-rw-r--r-- 1 user1 user1 7801 2月 3 2020 metrics.properties.template
-rw-r--r-- 1 user1 user1 885 8月 19 16:36 slaves
-rw-r--r-- 1 user1 user1 1502 8月 19 17:41 spark-defaults.conf
-rwxr-xr-x 1 user1 user1 4705 8月 19 16:42 spark-env.sh
[root@node1 conf]# rz
rz waiting to receive.
Starting zmodem transfer. Press Ctrl+C to cancel.
Transferring hive-site.xml...
100% 1 KB 1 KB/sec 00:00:01 0 Errors
[root@node1 conf]# ll
总用量 40
-rw-r--r-- 1 user1 user1 996 2月 3 2020 docker.properties.template
-rw-r--r-- 1 user1 user1 1105 2月 3 2020 fairscheduler.xml.template
-rw-r--r-- 1 root root 1844 5月 2 19:52 hive-site.xml
-rw-r--r-- 1 user1 user1 2059 8月 19 16:49 log4j.properties
-rw-r--r-- 1 user1 user1 7801 2月 3 2020 metrics.properties.template
-rw-r--r-- 1 user1 user1 885 8月 19 16:36 slaves
-rw-r--r-- 1 user1 user1 1502 8月 19 17:41 spark-defaults.conf
-rwxr-xr-x 1 user1 user1 4705 8月 19 16:42 spark-env.sh
[root@node1 conf]# cd ..
[root@node1 spark-2.4.5-bin-hadoop2.7]# ll
总用量 104
drwxr-xr-x 3 user1 user1 4096 9月 2 16:32 bin
drwxr-xr-x 2 user1 user1 215 9月 2 16:37 conf
drwxr-xr-x 5 user1 user1 50 2月 3 2020 data
drwxr-xr-x 4 user1 user1 29 2月 3 2020 examples
drwxr-xr-x 2 user1 user1 12288 2月 3 2020 jars
drwxr-xr-x 4 user1 user1 38 2月 3 2020 kubernetes
-rw-r--r-- 1 user1 user1 21371 2月 3 2020 LICENSE
drwxr-xr-x 2 user1 user1 4096 2月 3 2020 licenses
-rw-r--r-- 1 user1 user1 42919 2月 3 2020 NOTICE
drwxr-xr-x 9 user1 user1 311 2月 3 2020 python
drwxr-xr-x 3 user1 user1 17 2月 3 2020 R
-rw-r--r-- 1 user1 user1 3756 2月 3 2020 README.md
-rw-r--r-- 1 user1 user1 187 2月 3 2020 RELEASE
drwxr-xr-x 2 user1 user1 4096 2月 3 2020 sbin
drwxr-xr-x 2 user1 user1 42 2月 3 2020 yarn
[root@node1 spark-2.4.5-bin-hadoop2.7]# cd bin/
[root@node1 bin]# ll
总用量 116
-rwxr-xr-x 1 user1 user1 1089 2月 3 2020 beeline
-rw-r--r-- 1 user1 user1 1064 2月 3 2020 beeline.cmd
-rw-r--r-- 1 root root 724 9月 2 16:32 derby.log
-rwxr-xr-x 1 user1 user1 5440 2月 3 2020 docker-image-tool.sh
-rwxr-xr-x 1 user1 user1 1933 2月 3 2020 find-spark-home
-rw-r--r-- 1 user1 user1 2681 2月 3 2020 find-spark-home.cmd
-rw-r--r-- 1 user1 user1 1892 2月 3 2020 load-spark-env.cmd
-rw-r--r-- 1 user1 user1 2025 2月 3 2020 load-spark-env.sh
drwxr-xr-x 5 root root 133 9月 2 16:32 metastore_db
-rwxr-xr-x 1 user1 user1 2987 2月 3 2020 pyspark
-rw-r--r-- 1 user1 user1 1540 2月 3 2020 pyspark2.cmd
-rw-r--r-- 1 user1 user1 1170 2月 3 2020 pyspark.cmd
-rwxr-xr-x 1 user1 user1 1030 2月 3 2020 run-example
-rw-r--r-- 1 user1 user1 1223 2月 3 2020 run-example.cmd
-rwxr-xr-x 1 user1 user1 3196 2月 3 2020 spark-class
-rw-r--r-- 1 user1 user1 2817 2月 3 2020 spark-class2.cmd
-rw-r--r-- 1 user1 user1 1180 2月 3 2020 spark-class.cmd
-rwxr-xr-x 1 user1 user1 1039 2月 3 2020 sparkR
-rw-r--r-- 1 user1 user1 1097 2月 3 2020 sparkR2.cmd
-rw-r--r-- 1 user1 user1 1168 2月 3 2020 sparkR.cmd
-rwxr-xr-x 1 user1 user1 3122 2月 3 2020 spark-shell
-rw-r--r-- 1 user1 user1 1818 2月 3 2020 spark-shell2.cmd
-rw-r--r-- 1 user1 user1 1178 2月 3 2020 spark-shell.cmd
-rwxr-xr-x 1 user1 user1 1065 2月 3 2020 spark-sql
-rw-r--r-- 1 user1 user1 1118 2月 3 2020 spark-sql2.cmd
-rw-r--r-- 1 user1 user1 1173 2月 3 2020 spark-sql.cmd
-rwxr-xr-x 1 user1 user1 1040 2月 3 2020 spark-submit
-rw-r--r-- 1 user1 user1 1155 2月 3 2020 spark-submit2.cmd
-rw-r--r-- 1 user1 user1 1180 2月 3 2020 spark-submit.cmd
[root@node1 bin]# sp
spark-class sparkR spark-shell spark-sql spark-submit splain split sprof
[root@node1 bin]# spark-sql
21/09/02 16:37:57 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark master: local[*], Application Id: local-1630571880126
spark-sql> show databases;
aaa
default
Time taken: 1.886 seconds, Fetched 2 row(s)
spark-sql> select * from aaa.test1;
1 A1
2 A2
3 A3
4 A4
5 A5
6 A6
Time taken: 1.253 seconds, Fetched 6 row(s)
spark-sql> exit;
[root@node1 bin]#
集成后和Hive的Beeline功能差不多。
以上是关于通过配置hive-site.xml文件实现Hive集成Spark的主要内容,如果未能解决你的问题,请参考以下文章
通过配置hive-site.xml文件实现Hive集成Spark
全网最详细的hive-site.xml配置文件里如何添加达到Hive与HBase的集成,即Hive通过这些参数去连接HBase(图文详解)
解决配置hive时出现不能加载自己修改的hive-site.xml等配置文件的问题。