SparkSQL+Hive+Hbase+HbaseIntegration 不起作用
Posted
技术标签:
【中文标题】SparkSQL+Hive+Hbase+HbaseIntegration 不起作用【英文标题】:SparkSQL+Hive+Hbase+HbaseIntegration doesn't work 【发布时间】:2016-09-02 05:59:14 【问题描述】:当我尝试连接配置单元表时出现错误(正在 在 spark 中通过 HbaseIntegration 创建)
我遵循的步骤: Hive 表创建代码:
CREATE TABLE test.sample(id string,name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH
SERDEPROPERTIES ("hbase.columns.mapping" = ":key,details:name")
TBLPROPERTIES ("hbase.table.name" = "sample");
描述测试;
col_name data_type comment
id string from deserializer
name string from deserializer
使用以下命令启动 Spark shell:
spark-shell --master local[2] --driver-class-path /usr/local/hive/lib/hive-
hbase-handler-1.2.1.jar:
/usr/local/hbase/lib/hbase-server-0.98.9-
hadoop2.jar:/usr/local/hbase/lib/hbase-protocol-0.98.9-hadoo2.jar:
/usr/local/hbase/lib/hbase-hadoop2-compat-0.98.9-
hadoop2.jar:/usr/local/hbase/lib/hbase-hadoop-compat-0.98.9-hadoop2.jar:
/usr/local/hbase/lib/hbase-client-0.98.9-
hadoop2.jar:/usr/local/hbase/lib/hbase-common-0.98.9-hadoop2.jar:
/usr/local/hbase/lib/htrace-core-2.04.jar:/usr/local/hbase/lib/hbase-common-
0.98.9-hadoop2-tests.jar:
/usr/local/hbase/lib/hbase-server-0.98.9-hadoop2-
tests.jar:/usr/local/hive/lib/zookeeper-3.4.6.jar:/usr/local/hive/lib/guava-
14.0.1.jar
在火花壳中:
val sqlContext=new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(“select count(*) from test.sample”).collect()
堆栈跟踪:
堆栈 SQL 上下文可用作 sqlContext。
scala> sqlContext.sql("select count(*) from test.sample").collect()
16/09/02 04:49:28 INFO parse.ParseDriver: Parsing command: select count(*) from test.sample
16/09/02 04:49:35 INFO parse.ParseDriver: Parse Completed
16/09/02 04:49:40 INFO metastore.HiveMetaStore: 0: get_table : db=test tbl=sample
16/09/02 04:49:40 INFO HiveMetaStore.audit: ugi=hdfs ip=unknown-ip-addr cmd=get_table : db=test tbl=sample
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
at org.apache.hadoop.hive.hbase.HBaseSerDe.parseColumnsMapping(HBaseSerDe.java:184)
at org.apache.hadoop.hive.hbase.HBaseSerDeParameters.<init>(HBaseSerDeParameters.java:73)
at org.apache.hadoop.hive.hbase.HBaseSerDe.initialize(HBaseSerDe.java:117)
at org.apache.hadoop.hive.serde2.AbstractSerDe.initialize(AbstractSerDe.java:53)
at org.apache.hadoop.hive.serde2.SerDeUtils.initializeSerDe(SerDeUtils.java:521)
at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:391)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:276)
at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:258)
at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:605)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:331)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1$$anonfun$3.apply(ClientWrapper.scala:326)
at scala.Option.map(Option.scala:145)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:326)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getTableOption$1.apply(ClientWrapper.scala:321)
at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:279)
at org.apache.spark.sql.hive.client.ClientWrapper.liftedTree1$1(ClientWrapper.scala:226)
at org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:225)
at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:268)
at org.apache.spark.sql.hive.client.ClientWrapper.getTableOption(ClientWrapper.scala:321)
at org.apache.spark.sql.hive.client.ClientInterface$class.getTable(ClientInterface.scala:122)
at org.apache.spark.sql.hive.client.ClientWrapper.getTable(ClientWrapper.scala:60)
at org.apache.spark.sql.hive.HiveMetastoreCatalog.lookupRelation(HiveMetastoreCatalog.scala:384)
at org.apache.spark.sql.hive.HiveContext$$anon$2.org$apache$spark$sql$catalyst$analysis$OverrideCatalog$$super$lookupRelation(HiveContext.scala:457)
at org.apache.spark.sql.catalyst.analysis.OverrideCatalog$class.lookupRelation(Catalog.scala:161)
at org.apache.spark.sql.hive.HiveContext$$anon$2.lookupRelation(HiveContext.scala:457)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:303)
我正在使用 hadoop 2.6.0、spark 1.6.0、hive 1.2.1、hbase 0.98.9
我在 hadoop-env.sh 中添加了这个设置
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HBASE_HOME/lib/*
可以请一些机构提出任何解决方案
【问题讨论】:
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
,检查你的类路径
感谢 Alexander 的回复,我添加了 classpath,exportSPARK_HOME=/usr/local/spark export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin export SPARK_CLASSPATH=$SPARK_HOME/lib :$HBASE_HOME/lib:$HIVE_HOME/lib 你能告诉我我做的有什么错误吗?
我是spark新手。现在我可以通过SparkSQL查询Hive托管表。但是我不知道如何通过SparkSQL查询Hive的HbaseStorage Handler表。请指导我。谢谢亚历山大。
对不起,我不知道HBase,如果您对Hbas有疑问,请尝试在Google上搜索或提出新问题以获得帮助!
谢谢亚历山大,您的回复。
【参考方案1】:
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/util/Bytes
因为类路径中没有 hbase 相关的 jars
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH: `hbase classpath`
应该包括所有hbase相关的jar文件
或者使用--jars
在这里查看我的answer
注意:要验证类路径,您可以在驱动程序中添加以下代码以打印所有类路径资源
scala 版本:
val cl = ClassLoader.getSystemClassLoader
cl.asInstanceOf[java.net.URLClassLoader].getURLs.foreach(println)
java :
import java.net.URL;
import java.net.URLClassLoader;
...
ClassLoader cl = ClassLoader.getSystemClassLoader();
URL[] urls = ((URLClassLoader)cl).getURLs();
for(URL url: urls)
System.out.println(url.getFile());
【讨论】:
你好,即使我也面临同样的问题。上述解决方案无效。 @RohanNayak :提出描述环境和您的问题的新问题。这已经是 1 年以上的老问题了 @RohanNayak :这个命令的输出是什么?hbase classpath
附加反引号作为前缀和后缀
@RohanNayak :类路径错误很棘手,可能是特定于环境的....更新了关于如何验证类路径的答案
嗨@Ram。我创建了新线程,但还没有答案。 .***.com/questions/46793327/…【参考方案2】:
我让它工作了。你必须使用下面的罐子。
spark-shell --master yarn-client --executor-cores 10 --executor-memory 20G --num-executors 15 --driver-memory 2G --driver-class-path /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar --jars /usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar,/usr/hdp/current/hive-client/lib/hive-hbase-handler.jar --files /etc/spark/conf/hbase-site.xml
【讨论】:
以上是关于SparkSQL+Hive+Hbase+HbaseIntegration 不起作用的主要内容,如果未能解决你的问题,请参考以下文章