实例化 HiveSessionState 中的 Spark2 数据加载问题

Posted

技术标签:

【中文标题】实例化 HiveSessionState 中的 Spark2 数据加载问题【英文标题】:Spark2 data load issue in instantiating HiveSessionState 【发布时间】:2017-12-26 14:21:51 【问题描述】:

在集群模式下使用 Spark2 读取数据时出现以下问题。 "java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':" 在谷歌上搜索了很多之后,我对这个问题一无所知。请帮忙。

我运行的代码

spark = SparkSession.builder.getOrCreate();

val lines: Dataset[String] = spark.read.textFile("/data/sample/abc.csv").

异常来自上述行。

异常全栈跟踪:

ERROR yarn.ApplicationMaster: User class threw exception: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
    at org.apache.spark.sql.DataFrameReader.<init>(DataFrameReader.scala:549)
    at org.apache.spark.sql.SparkSession.read(SparkSession.scala:605)
    at com.abcd.Learning$.main(Learning.scala:26)
    at com.abcd.Learning.main(Learning.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:646)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
    ... 11 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
    at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
    ... 16 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
    ... 24 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:353)
    at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:257)
    at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
    ... 29 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.pepperdata.supervisor.agent.resource.LocalFileSystemWrapper not found
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:548)
    at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
    ... 37 more
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class com.pepperdata.supervisor.agent.resource.LocalFileSystemWrapper not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2199)
    at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2685)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2705)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:97)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2748)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2730)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:385)
    at org.apache.hadoop.fs.FileSystem.getLocal(FileSystem.java:356)
    at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:666)
    at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:593)
    at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:526)
    ... 38 more
Caused by: java.lang.ClassNotFoundException: Class com.pepperdata.supervisor.agent.resource.LocalFileSystemWrapper not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2105)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2197)
    ... 48 more 

【问题讨论】:

你找到解决上述问题的方法了吗? 【参考方案1】:

类似于给定的解决方案 here 为我工作。

我做了以下

将 spark jars 目录压缩到这里:/usr/local/Cellar/apache-spark/2.1.0/libexec/jars,并命名为spark-jars.zipspark-jars.zip复制到hdfs:$ hdfs dfs -copyFromLocal /usr/local/Cellar/apache-spark/2.1.0/libexec/spark-jars.zip hdfs:/user/&lt;username&gt;/ 在执行 spark 作业时通过了配置中的spark-jars.zip 位置:$ HADOOP_CONF_DIR=/Users/&lt;username&gt;/hadoop_conf spark-submit --conf spark.yarn.archive=hdfs:/user/&lt;username&gt;/spark-jars.zip --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --class "com.&lt;whatever&gt;.&lt;package&gt;" --master yarn --deploy-mode cluster --queue online1 --driver-memory 3G --executor-memory 3G ./build/libs/&lt;main class&gt;.jar

【讨论】:

感谢您的回复。但对我来说,问题是不同的。某些配置被严重操纵,导致整个环境损坏。现在已经解决了。

以上是关于实例化 HiveSessionState 中的 Spark2 数据加载问题的主要内容,如果未能解决你的问题,请参考以下文章

java.lang.IllegalArgumentException:实例化'org.apache.spark.sql.hive.HiveSessionState'时出错:使用spark会话读取csv

coq 中的存在实例化和泛化

TP框架中的M,D,C,A,I,S方法

设计一个类,让它始终只能实例化一个对象。

python 类实例化

socket 实例化方法