Spark-SQL连接Hive

Posted WOTGL

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Spark-SQL连接Hive相关的知识,希望对你有一定的参考价值。

第一步修个Hive的配置文件hive-site.xml

  添加如下属性,取消本地元数据服务:

<property>  
  <name>hive.metastore.local</name>  
  <value>false</value>  
</property>

  修改Hive元数据服务地址和端口:

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://192.168.10.10:9083</value>
  <description>Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore.</description>
</property>

  然后把配置文件hive-site.xml拷贝到Spark的conf目录下

 

第二步:对于Hive元数据库使用mysql的把mysql-connector-java-5.1.41-bin.jar拷贝到Spark的jar目录下

  到这里已经能够在Scala终端下查询Hive数据库了

  但是某人一开始的要求是用Spark-SQL查询Hive呀

  于是启动Spark-SQL,启了一天了都是报下面的错误

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
    at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:114)
    at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1523)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
    at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
    at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
    at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
    ... 11 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
    ... 17 more
Caused by: MetaException(message:Version information not found in metastore. )
    at org.apache.hadoop.hive.metastore.ObjectStore.checkSchema(ObjectStore.java:6664)
    at org.apache.hadoop.hive.metastore.ObjectStore.verifySchema(ObjectStore.java:6645)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.hive.metastore.RawStoreProxy.invoke(RawStoreProxy.java:114)
    at com.sun.proxy.$Proxy6.verifySchema(Unknown Source)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:572)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
    at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
    at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
    at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
    at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
    at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
    ... 22 more

 

一开始我查这个bug都是用第一行的报错信息查,都没成功,后面搜了下最后一个报错信息

message:Version information not found in metastore

终于找到问题解决方法了,把hive-site.xml中的hive.metastore.schema.verification的值改为false

<property>
  <name>hive.metastore.schema.verification</name>
  <value>false</value>
  <description>
      Enforce metastore schema version consistency.
      True: Verify that version information stored in is compatible with one from Hive jars.  Also disable automatic
            schema migration attempt. Users are required to manually migrate schema after Hive upgrade which ensures
            proper metastore schema migration. (Default)
      False: Warn if the version information stored in metastore doesn\'t match with one from in Hive jars.
  </description>
</property>

原因应该是Hive的jar包和存储元数据信息版本不一致,这里设置不验证就可以了。

 


 参考博客:http://www.cnblogs.com/rocky-AGE-24/p/7345417.html

     http://blog.csdn.net/jyl1798/article/details/41087533

       http://dblab.xmu.edu.cn/blog/1086-2/

       http://blog.csdn.net/youngqj/article/details/19987727

以上是关于Spark-SQL连接Hive的主要内容,如果未能解决你的问题,请参考以下文章

在 HIVE 上插入 Spark-SQL 插件

sparkf:spark-sql替换hive查询引擎

Spark-Sql整合hive,在spark-sql命令和spark-shell命令下执行sql命令和整合调用hive

通过spark-sql快速读取hive中的数据

hadoop-spark集群安装---5.hive和spark-sql

Spark: Spark-sql 读hbase