如何让groovysh与apache一起工作

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何让groovysh与apache一起工作相关的知识,希望对你有一定的参考价值。

我成功地使用Apache Spark和Groovy,但是我没有运气使用groovysh作为交互式火花外壳。

Groovy Shell (2.5.0-beta-3, JVM: 1.8.0_161)
Type ':help' or ':h' for help.
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groovy:000> :grab org.apache.spark:spark-sql_2.11:2.2.1
groovy:000> import org.apache.spark.sql.*
===> org.apache.spark.sql.*
groovy:000> spark = SparkSession.builder().master("local[*]").getOrCreate()
===> org.apache.spark.sql.SparkSession@14201a90
groovy:000> test = spark.read().csv('test.csv')
ERROR java.lang.LinkageError:
loader constraint violation: when resolving method "java.lang.management.ManagementFactory.newPlatformMXBeanProxy(Ljavax/management/MBeanServerConnection;Ljava/lang/String;Ljava/lang/Class;)Ljava/lang/Object;" the class loader (instance of org/codehaus/groovy/tools/RootLoader) of the current class, org/apache/spark/util/SizeEstimator$, and the class loader (instance of <bootloader>) for the method's defining class, java/lang/management/ManagementFactory, have different Class objects for the type javax/management/MBeanServerConnection used in the signature
        at org.apache.spark.util.SizeEstimator$.getIsCompressedOops (SizeEstimator.scala:149)
        at org.apache.spark.util.SizeEstimator$.initialize (SizeEstimator.scala:112)
        at org.apache.spark.util.SizeEstimator$.<init> (SizeEstimator.scala:105)
        at org.apache.spark.util.SizeEstimator$.<clinit> (SizeEstimator.scala)
        at org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$1.weigh (FileStatusCache.scala:109)
        at org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$1.weigh (FileStatusCache.scala:107)
        at org.spark_project.guava.cache.LocalCache$Segment.setValue (LocalCache.java:2222)
        at org.spark_project.guava.cache.LocalCache$Segment.put (LocalCache.java:2944)
        at org.spark_project.guava.cache.LocalCache.put (LocalCache.java:4212)
        at org.spark_project.guava.cache.LocalCache$LocalManualCache.put (LocalCache.java:4804)
        at org.apache.spark.sql.execution.datasources.SharedInMemoryCache$$anon$3.putLeafFiles (FileStatusCache.scala:152)
        at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$listLeafFiles$2.apply (InMemoryFileIndex.scala:128)
        at org.apache.spark.sql.execution.datasources.InMemoryFileIndex$$anonfun$listLeafFiles$2.apply (InMemoryFileIndex.scala:126)
        at scala.collection.mutable.ResizableArray$class.foreach (ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach (ArrayBuffer.scala:48)
        at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.listLeafFiles (InMemoryFileIndex.scala:126)
        at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.refresh0 (InMemoryFileIndex.scala:90)
        at org.apache.spark.sql.execution.datasources.InMemoryFileIndex.<init> (InMemoryFileIndex.scala:66)
        at org.apache.spark.sql.execution.datasources.DataSource.tempFileIndex$lzycompute$1 (DataSource.scala:129)
        at org.apache.spark.sql.execution.datasources.DataSource.org$apache$spark$sql$execution$datasources$DataSource$$tempFileIndex$1 (DataSource.scala:120)
        at org.apache.spark.sql.execution.datasources.DataSource.getOrInferFileFormatSchema (DataSource.scala:134)
        at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation (DataSource.scala:353)
        at org.apache.spark.sql.DataFrameReader.load (DataFrameReader.scala:178)
        at org.apache.spark.sql.DataFrameReader.csv (DataFrameReader.scala:533)
        at org.apache.spark.sql.DataFrameReader.csv (DataFrameReader.scala:412)
        at org.apache.spark.sql.DataFrameReader$csv.call (Unknown Source)

另一方面,看似等效的groovy脚本工作得很好

@Grab('org.apache.spark:spark-sql_2.11:2.2.1')
import org.apache.spark.sql.*

def spark = SparkSession.builder().master("local[*]").getOrCreate()

def test = spark.read().csv("test.csv")
test.show()

我正在寻找一种方法来解决上面的错误,并了解groovysh环境与常规groovy脚本执行的不同之处。

答案

幸运的是,新的Spark 2.3.0版本不再存在这个问题:

Groovy Shell (2.5.0-beta-3, JVM: 1.8.0_161)
Type ':help' or ':h' for help.
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
groovy:000> :grab org.apache.spark:spark-sql_2.11:2.3.0
groovy:000> import org.apache.spark.sql.*
===> org.apache.spark.sql.*
groovy:000> spark = SparkSession.builder().master("local[*]").getOrCreate()
===> org.apache.spark.sql.SparkSession@1de85972
groovy:000> test = spark.read().csv('test.csv')
===> [_c0: string, _c1: string ... 1 more field]
groovy:000> test.show()
+---+---+---+
|_c0|_c1|_c2|
+---+---+---+
|  1|  2|  3|
+---+---+---+

===> null

以上是关于如何让groovysh与apache一起工作的主要内容,如果未能解决你的问题,请参考以下文章

如何让 Python 代码与 C++ App 一起工作?

如何使 phpmyadmin 与 NGINX 一起工作

如何让 CAST 与除法一起正常工作

如何让 tkinter 与 Blender v2.90 一起工作?

如何让 perl 与 gmail 聊天一起工作?

如何让我的 iOS4.2 SDK 开发代码与 iOS3.1.3 一起工作?