Pyspark 错误实例化 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"

Posted

技术标签:

【中文标题】Pyspark 错误实例化 \'org.apache.spark.sql.hive.HiveSessionStateBuilder\':"【英文标题】:Pyspark error instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"Pyspark 错误实例化 'org.apache.spark.sql.hive.HiveSessionStateBuilder':" 【发布时间】:2018-02-04 08:06:42 【问题描述】:

我正在尝试在 *** 上找到的这段代码

from pyspark.mllib.linalg.distributed import RowMatrix
rows = sc.parallelize([(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)])

# Convert to RowMatrix
mat = RowMatrix(rows)

# Calculate exact and approximate similarities
exact = mat.columnSimilarities()
approx = mat.columnSimilarities(0.05)

# Output
exact.entries.collect()
[MatrixEntry(0, 2, 0.991935352214),
 MatrixEntry(1, 2, 0.998441152599),
 MatrixEntry(0, 1, 0.997463284056)]

然后当我运行 exaxt.entries 时出现此错误

---------------------------------------------------------------------------
IllegalArgumentException                  Traceback (most recent call last)
<ipython-input-3-912f64c8ec62> in <module>()
----> 1 print exact.entries

D:\opt\spark\spark-2.2.0-bin-hadoop2.7\python\pyspark\mllib\linalg\distributed.pyc in entries(self)
    824         # DataFrame on the Scala/Java side. Then we map each Row in
    825         # the DataFrame back to a MatrixEntry on this side.
--> 826         entries_df = callMLlibFunc("getMatrixEntries", self._java_matrix_wrapper._java_model)
    827         entries = entries_df.rdd.map(lambda row: MatrixEntry(row[0], row[1], row[2]))
    828         return entries

D:\opt\spark\spark-2.2.0-bin-hadoop2.7\python\pyspark\mllib\common.pyc in callMLlibFunc(name, *args)
    128     sc = SparkContext.getOrCreate()
    129     api = getattr(sc._jvm.PythonMLLibAPI(), name)
--> 130     return callJavaFunc(sc, api, *args)
    131 
    132 

D:\opt\spark\spark-2.2.0-bin-hadoop2.7\python\pyspark\mllib\common.pyc in callJavaFunc(sc, func, *args)
    121     """ Call Java Function """
    122     args = [_py2java(sc, a) for a in args]
--> 123     return _java2py(sc, func(*args))
    124 
    125 

D:\opt\spark\spark-2.2.0-bin-hadoop2.7\python\lib\py4j-0.10.4-src.zip\py4j\java_gateway.py in __call__(self, *args)
   1131         answer = self.gateway_client.send_command(command)
   1132         return_value = get_return_value(
-> 1133             answer, self.gateway_client, self.target_id, self.name)
   1134 
   1135         for temp_arg in temp_args:

D:\opt\spark\spark-2.2.0-bin-hadoop2.7\python\pyspark\sql\utils.pyc in deco(*a, **kw)
     77                 raise QueryExecutionException(s.split(': ', 1)[1], stackTrace)
     78             if s.startswith('java.lang.IllegalArgumentException: '):
---> 79                 raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
     80             raise
     81     return deco

IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"

我使用这个网站https://medium.com/@GalarnykMichael/install-spark-on-windows-pyspark-4498a5d8d66c在windows10上安装了Spark

有什么解决办法吗?谢谢

*更新:来自 cmd 的附加错误 我不明白这个错误。我的文件夹中实际上存在一个 metastore_db

18/02/04 16:48:22 ERROR Schema: Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@700d488a, see the next exception for details.
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
        at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
        at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
        at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.jdbc.InternalDriver.getNewEmbedConnection(Unknown Source)
        at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
        at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
        at org.apache.derby.jdbc.AutoloadedDriver.connect(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at java.sql.DriverManager.getConnection(Unknown Source)
        at com.jolbox.bonecp.BoneCP.obtainRawInternalConnection(BoneCP.java:361)
        at com.jolbox.bonecp.BoneCP.<init>(BoneCP.java:416)
        at com.jolbox.bonecp.BoneCPDataSource.getConnection(BoneCPDataSource.java:120)
        at org.datanucleus.store.rdbms.ConnectionFactoryImpl$ManagedConnectionImpl.getConnection(ConnectionFactoryImpl.java:501)
        at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:298)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631)
        at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301)
        at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187)
        at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333)
        at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960)
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365)
        at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394)
        at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291)
        at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258)
        at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:76)
        at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:136)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:57)
        at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620)
        at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:66)
        at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72)
        at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762)
        at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:199)
        at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.<init>(SessionHiveMetaStoreClient.java:74)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:86)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132)
        at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104)
        at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005)
        at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024)
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:503)
        at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
        at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
        at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
        at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
        at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
        at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:193)
        at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
        at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
        at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
        at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
        at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1050)
        at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
        at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:129)
        at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:126)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:65)
        at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:283)
        at org.apache.spark.mllib.api.python.PythonMLLibAPI.getMatrixEntries(PythonMLLibAPI.scala:1207)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:280)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:214)
        at java.lang.Thread.run(Unknown Source)
Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@700d488a, see the next exception for details.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown Source)
        ... 113 more
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database C:\Users\USER\Desktop\ColumnSimilarities\metastore_db.
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
        at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.store.raw.RawStore$6.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.store.raw.RawStore.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.store.access.RAMAccessManager$5.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.store.access.RAMAccessManager.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.db.BasicDatabase$5.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.db.BasicDatabase.bootServiceModule(Unknown Source)
        at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
        at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
        at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
        at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
        at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
        at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at org.apache.derby.impl.jdbc.EmbedConnection.startPersistentService(Unknown Source)
        ... 110 more

*更新

我使用了我在它正在运行的文档中找到的这部分代码。

from pyspark.mllib.linalg.distributed import  RowMatrix, IndexedRowMatrix,CoordinateMatrix, MatrixEntry

#Create an RDD of coordinate entries.
#   - This can be done explicitly with the MatrixEntry class:
# entries = sc.parallelize([MatrixEntry(0, 0, 1.2), MatrixEntry(1, 0, 2.1), MatrixEntry(6, 1, 3.7)])
#   - or using (long, long, float) tuples:
entries = sc.parallelize([(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)])
#([(0, 0, 1.2), (1, 0, 2.1), (2, 1, 3.7)])
mat1 = RowMatrix(entries)

exact = mat1.columnSimilarities()
exact.entries.collect()

【问题讨论】:

您可以查看***.com/questions/45186614/… 我仍然没有从上述链接中得到任何修复。 @KRKirov 错误是完全不同的,并且链接到的博客已经说明了从哪里获得 winutils Another instance of Derby may have already booted the database C:\Users\USER\Desktop\ColumnSimilarities\metastore_db... 如果你只是在玩 Spark,这个文件夹并不重要,可以删除。 @KRKirov winutils 的链接完全相同 【参考方案1】:

tl;dr 关闭 C:\Users\USER\Desktop\ColumnSimilarities\metastore_db 中的另一个 Spark 应用程序实例并重新开始。

例外:

ERROR XSDB6:另一个 Derby 实例可能已经启动了数据库 C:\Users\USER\Desktop\ColumnSimilarities\metastore_db

说明一切。您已经在C:\Users\USER\Desktop\ColumnSimilarities 中启动并运行了两个 Spark 应用程序实例。默认配置单元外部目录(又名元存储)是不可能的。

Spark SQL 使用外部目录(也称为元存储)来管理持久表的元数据,并使用 Derby 作为允许单个客户端访问它的底层数据库。

在给定目录中只能有一个 Spark 应用程序(或者您必须更改 HiveExternalCatalog 存储元数据的目录)。

【讨论】:

在我的 jupyter notebook 中应该只有一个正在运行的 .ipynb 是正确的吗? 不知道。您应该只检查机器上的 JVM 进程数。应该只有一个(或任意数量,但每个都有自己的工作目录)。 尝试在一个单独的目录中执行代码,该目录中只运行一个笔记本并且存在 1 个笔记本文件。让它工作。如果还有错误会尽快更新。【参考方案2】:
from pyspark.mllib.linalg.distributed import  RowMatrix, IndexedRowMatrix,CoordinateMatrix, MatrixEntry

#Create an RDD of coordinate entries.
#   - This can be done explicitly with the MatrixEntry class:
# entries = sc.parallelize([MatrixEntry(0, 0, 1.2), MatrixEntry(1, 0, 2.1), MatrixEntry(6, 1, 3.7)])
#   - or using (long, long, float) tuples:
entries = sc.parallelize([(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)])
#([(0, 0, 1.2), (1, 0, 2.1), (2, 1, 3.7)])
mat1 = RowMatrix(entries)

exact = mat1.columnSimilarities()
exact.entries.collect()

使用文档中的代码再次尝试。

【讨论】:

我怀疑这是否是解决方案,因为ERROR XSDB6: Another instance of Derby may have already booted the database C:\Users\USER\Desktop\ColumnSimilarities\metastore_db 准确地说明了问题所在。看来您已经停止了 Spark 的另一个实例,同时添加了使应用程序运行的行。我认为这只是一个巧合。你要证据吗?在C:\Users\USER\Desktop\ColumnSimilarities 中启动pyspark 并运行应用程序。 @JacekLaskowski 你是对的。我又得到了错误。我仍然不明白如何解决它。【参考方案3】:

我在 spark 2.2.0 上犯了同样的错误 所以我检查了端口,我发现其他一些火花进程持有该端口。所以你所要做的就是去你的shell并输入“JPS”命令来查看是否存在其他spark进程

user@server:~$ JPS

所以你会看到一个已经在你的系统上运行的java进程的列表 喜欢:

78849 Jps
53409 RemoteInterpreterServer
78627 RemoteInterpreterServer
78515 RemoteInterpreterServer
76244 RemoteInterpreterServer
58566 SparkSubmit
77510 NameNode
74601 ZeppelinServer
17912 Master
77755 SecondaryNameNode
74684 LivyServer
77854 SparkSubmit

enter image description here

那么你必须通过他们的进程 ID 来终止 spark 会话 使用以下命令:

sudo kill -9 #process id#
ex : sudo kill -9 58566

或者你可以使用这个

 sudo killall -9 SparkSubmit

如果您的记忆中没有打开的 spark 会话,您将收到:“未找到进程”

现在尝试再次运行您的代码!

神速

【讨论】:

以上是关于Pyspark 错误实例化 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"的主要内容,如果未能解决你的问题,请参考以下文章

pyspark.sql 无法实例化 HiveMetaStoreClient - noclassfound from org.apache.commons.dbcp.connectionfactory

实例化“org.apache.spark.sql.hive.HiveExternalCatalog”时出错

SparkSQL 错误:org.apache.hadoop.hive.ql.metadata.HiveException:无法实例化 org.apache.hadoop.hive.ql.metadat

解决 mapreduce.Cluster 无法使用 org.apache.hadoop.mapred.YarnClientProtocolProvider 由于实例化 YarnClient 错误

实例化 'org.apache.spark.sql.hive.HiveSessionState' 时出错:"

无法加载或实例化 TagLibraryValidator 类:org.apache.taglibs.standard.tlv.JstlCoreTLV