Spark 和 Hbase-client 中的版本兼容性
Posted
技术标签:
【中文标题】Spark 和 Hbase-client 中的版本兼容性【英文标题】:Version Compatibility in Spark and Hbase-client 【发布时间】:2016-11-30 14:59:39 【问题描述】:我正在尝试编写 Spark 批处理作业。我想将它打包到一个 jar 中并与 spark submit 一起使用。我的程序在 spark-shell 中运行良好,但是当我尝试使用 spark 提交运行它时出现以下错误:
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
at HBaseBulkload$.saveAsHFile(ThereInLocationGivenTimeInterval.scala:103)
at HBaseBulkload$.toHBaseBulk(ThereInLocationGivenTimeInterval.scala:178)
at ThereInLocationGivenTimeInterval$.main(ThereInLocationGivenTimeInterval.scala:241)
at ThereInLocationGivenTimeInterval.main(ThereInLocationGivenTimeInterval.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
根据this answer,问题源于版本不兼容。我还找到了this,但我的 spark 版本是 1.6.0 这是我的项目的 .sbt 文件:
name := "HbaseBulkLoad"
version := "1.0"
scalaVersion := "2.10.5"
resolvers += "Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.0"
//libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.9.0"
//libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.0-cdh5.9.0"
//libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.9.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.1.2"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.1.2"
我的导入和导致错误的代码段如下: /SimpleApp.scala/ 导入 org.apache.spark.SparkContext 导入 org.apache.spark.SparkContext._ 导入 org.apache.spark.SparkConf
// HBaseBulkLoad imports
import java.util.UUID
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.permission.FsPermission
import org.apache.hadoop.fs.Path, FileSystem
import org.apache.hadoop.hbase.KeyValue, TableName
import org.apache.hadoop.hbase.client._
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
import org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2, LoadIncrementalHFiles
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.mapreduce.Job
import org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner
import org.apache.spark.rdd.RDD
import org.apache.spark.Partitioner
import org.apache.spark.storage.StorageLevel
import scala.collection.JavaConversions._
import scala.reflect.ClassTag
// Hbase admin imports
import org.apache.hadoop.hbase.HBaseConfiguration, HTableDescriptor
import org.apache.hadoop.hbase.client.HBaseAdmin
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.HColumnDescriptor
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Put;
import org.apache.hadoop.hbase.client.HTable;
import java.util.Calendar
val now = Calendar.getInstance.getTimeInMillis
//val filteredRdd = myRdd.filter(...
val resultRdd= filteredRdd.map row => (row(0).asInstanceOf[String].getBytes(),
scala.collection.immutable.Map("batchResults" ->
Array( ( "batchResult1", ("true", now) ) )
)
)
println( resultRdd.count )
【问题讨论】:
【参考方案1】:工作的 .sbt 文件如下:
name := "HbaseBulkLoad"
version := "1.0"
scalaVersion := "2.10.5"
resolvers += "Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/"
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.0-cdh5.9.0"
libraryDependencies += "org.apache.hbase" % "hbase-common" % "1.2.0-cdh5.9.0"
libraryDependencies += "org.apache.hbase" % "hbase-client" % "1.2.0-cdh5.9.0"
libraryDependencies += "org.apache.hbase" % "hbase-server" % "1.2.0-cdh5.9.0"
如果你使用的是cloudera,你可以在以下目录中找到jar和对应的版本:
/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/jars
【讨论】:
以上是关于Spark 和 Hbase-client 中的版本兼容性的主要内容,如果未能解决你的问题,请参考以下文章
hbase-shaded-client和hbase-client的区别
Hbase-client kerberos 身份验证仅在本地计算机上工作