无法从 sparksql 连接配置单元元存储 [重复]
Posted
技术标签:
【中文标题】无法从 sparksql 连接配置单元元存储 [重复]【英文标题】:Unable to connect hive metastore from sparksql [duplicate] 【发布时间】:2016-11-19 06:18:09 【问题描述】:蜂巢 .14 Spark 1.6 .尝试务实地从火花连接蜂巢表。我已经将我的 hive-site.xml 放在 spark conf 文件夹中。但是当我运行这段代码时,每次它连接到底层配置单元元存储,即 Derby。我尝试了很多谷歌搜索,但在任何地方我都得到建议将 hive-site.xml 放在 spark cofiguration 文件夹中,我已经这样做了。请有人建议我解决方案。以下是我的代码
仅供参考:我现有的配置单元使用 mysql 作为元存储。
我直接从 Eclipse 运行此代码,而不是使用 spark-submit 实用程序。
package org.scala.spark
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql.hive.HiveContext
object HiveToHdfs
def main(args: Array[String])
val conf=new SparkConf().setAppName("HDFS to Local").setMaster("local")
val sc=new SparkContext(conf)
val hiveContext=new org.apache.spark.sql.hive.HiveContext(sc)
import hiveContext.implicits._
hiveContext.sql("load data local inpath '/home/cloudera/Documents/emp_table.txt' into table employee")
sc.stop()
以下是我的eclipse错误日志:
16/11/18 22:09:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/11/18 22:09:03 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
16/11/18 22:09:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MFieldSchema" is tagged as "embedded-only" so does not have its own datastore table.
16/11/18 22:09:06 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MOrder" is tagged as "embedded-only" so does not have its own datastore table.
**16/11/18 22:09:06 INFO MetaStoreDirectSql: Using direct SQL, underlying DB is DERBY**
16/11/18 22:09:06 INFO ObjectStore: Initialized ObjectStore
16/11/18 22:09:06 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
16/11/18 22:09:06 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
16/11/18 22:09:07 INFO HiveMetaStore: Added admin role in metastore
16/11/18 22:09:07 INFO HiveMetaStore: Added public role in metastore
16/11/18 22:09:07 INFO HiveMetaStore: No user is added in admin role, since config is empty
16/11/18 22:09:07 INFO HiveMetaStore: 0: get_all_databases
16/11/18 22:09:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_all_databases
16/11/18 22:09:07 INFO HiveMetaStore: 0: get_functions: db=default pat=*
16/11/18 22:09:07 INFO audit: ugi=cloudera ip=unknown-ip-addr cmd=get_functions: db=default pat=*
16/11/18 22:09:07 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
at org.apache.spark.sql.hive.client.ClientWrapper.<init>(ClientWrapper.scala:194)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:238)
at org.apache.spark.sql.hive.HiveContext.executionHive$lzycompute(HiveContext.scala:218)
at org.apache.spark.sql.hive.HiveContext.executionHive(HiveContext.scala:208)
at org.apache.spark.sql.hive.HiveContext.functionRegistry$lzycompute(HiveContext.scala:462)
at org.apache.spark.sql.hive.HiveContext.functionRegistry(HiveContext.scala:461)
at org.apache.spark.sql.UDFRegistration.<init>(UDFRegistration.scala:40)
at org.apache.spark.sql.SQLContext.<init>(SQLContext.scala:330)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:90)
at org.apache.spark.sql.hive.HiveContext.<init>(HiveContext.scala:101)
at org.scala.spark.HiveToHdfs$.main(HiveToHdfs.scala:15)
at org.scala.spark.HiveToHdfs.main(HiveToHdfs.scala)
Caused by: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rwx------
at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:612)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
... 12 more
16/11/18 22:09:07 INFO SparkContext: Invoking stop() from shutdown hook
如果还需要其他信息中的任何其他内容来纠正它,请告诉我。
【问题讨论】:
你把 hive-site.xml 复制到 spark 的 conf 目录了吗? 你能分享你的 hive-site.xml 是的,Ishan,我确实将 hive-site.xml 复制到 spark conf 文件夹中 嗨 Nirmal,这是我的 hive-site.xml。我改成txt格式hive-site.xml 检查目录的权限。我看到以下错误原因:java.lang.RuntimeException:根暂存目录:HDFS 上的 /tmp/hive 应该是可写的。当前权限为:rwx------ 【参考方案1】:检查此链接-> https://issues.apache.org/jira/browse/SPARK-15118 Metastore 可能正在使用 mysql db
以上错误来自,
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive</value>
<description>HDFS root scratch dir for Hive jobs which gets created with write all (733) permission. For each connecting user, an HDFS scratch dir: $hive.exec.scratchdir/<username> is created, with $hive.scratch.dir.permission.</description>
</property>
授予 /tmp/hive 的权限
【讨论】:
以上是关于无法从 sparksql 连接配置单元元存储 [重复]的主要内容,如果未能解决你的问题,请参考以下文章
数据存储在对象存储中时从 Spark SQL 访问 Hive 表