SPARK 2.0:火花信息理论特征选择 java.lang.NoSuchMethodError:微风.linalg.DenseMatrix

Posted

技术标签:

【中文标题】SPARK 2.0:火花信息理论特征选择 java.lang.NoSuchMethodError:微风.linalg.DenseMatrix【英文标题】:SPARK 2.0: spark-infotheoretic-feature-selection java.lang.NoSuchMethodError: breeze.linalg.DenseMatrix 【发布时间】:2018-09-07 19:20:59 【问题描述】:

我正在尝试使用 Spark 的 InfoGain 第三方 (https://github.com/sramirez/spark-infotheoretic-feature-selection) 包的 MRMR 功能。但是我的集群是 2.0,我得到了这个异常。即使我将所有必需的 Jar 文件添加到 spark 类路径。但它仍然无法正常工作。虽然它在本地机器上正常工作,但在集群上却不行。

例外:

18/03/29 01:16:43 WARN TaskSetManager: Lost task 3.0 in stage 14.0 (TID 47, EUREDWORKER3): java.lang.NoSuchMethodError: breeze.linalg.DenseMatrix$.canMapValues(Lscala/reflect/ClassTag;)Lbreeze/generic/UFunc$UImpl2;
at org.apache.spark.mllib.feature.InfoTheorySparse$$anonfun$15.apply(InfoTheory.scala:172)
at org.apache.spark.mllib.feature.InfoTheorySparse$$anonfun$15.apply(InfoTheory.scala:172)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$43$$anonfun$apply$44.apply(PairRDDFunctions.scala:759)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$mapValues$1$$anonfun$apply$43$$anonfun$apply$44.apply(PairRDDFunctions.scala:759)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:214)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:332)
at org.apache.spark.rdd.RDD$$anonfun$8.apply(RDD.scala:330)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:935)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:866)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:926)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:670)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:330)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:281)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:283)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Reference for Spark class path

【问题讨论】:

您是否有可能使用了某个 jar 文件的过期版本或不兼容版本? 是的,是微风版本的问题。我通过将旧版本微风_2.11_0.11 更改为 0.13.2 解决了这个问题。感谢您为我指明正确的方向。 【参考方案1】:

是微风版本的问题。我正在添加一个旧版本的breeze_2.11_0.11,并将其更改为breeze_2.11-0.13.2.jar 解决了这个问题。

【讨论】:

以上是关于SPARK 2.0:火花信息理论特征选择 java.lang.NoSuchMethodError:微风.linalg.DenseMatrix的主要内容,如果未能解决你的问题,请参考以下文章

无法在纱线簇模式下读取带有火花的Hbase数据

具有特征的 Spark 2.0 数据集编码器

将火花数据帧写入固定宽度文件java spark

蜂巢上的火花 - 为啥不“选择*”产生火花应用程序/执行程序?

提交火花作业时获取 java.lang.NoSuchMethodError

如何在 oozie 4.2.0 上运行火花动作(pyspark 脚本)?