从同一目录启动多个 Jupyter Spark 笔记本时发生 Metastore 错误

Posted

技术标签:

【中文标题】从同一目录启动多个 Jupyter Spark 笔记本时发生 Metastore 错误【英文标题】:Metastore error while launching multiple Jupyter Spark notebooks from same directory 【发布时间】:2017-07-25 18:37:36 【问题描述】:

我正在运行带有 Spark (Spark 2.1.0) 的 Jupyter Notebook (Jupyter 1.0.0),并且能够成功运行 Pyspark 代码。但是当我启动两个位于同一目录下的笔记本时,如下所示: 笔记本 |__ 笔记本1 |__ 笔记本 2 并启动 Notebook1,然后启动 Notebook2,Notebook1 启动并成功运行,但由于 Spark Context 启动错误,Notebook2 未成功启动。看起来这与spark metastore 有关。

以下是 Spark 的堆栈跟踪:

    Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database XXXXXXXX/notebooks/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore$6.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.RawStore.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.access.RAMAccessManager.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.db.BasicDatabase.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.jdbc.EmbedConnection.startPersistentService(Unknown Source)

【问题讨论】:

【参考方案1】:

看起来您的问题可能与 Derby 相关,Sparks 默认数据库用于将其指令存储到自身。 Derby 一次只能由一个实体使用,因此启动第二个 Notebook 会使它感到困惑。我自己遇到了这个问题,而是通过将 Spark 连接到 Postgres 来解决它。

查看此答案,看看它是否适合您: How to run multiple instances of spark 2.0

【讨论】:

以上是关于从同一目录启动多个 Jupyter Spark 笔记本时发生 Metastore 错误的主要内容,如果未能解决你的问题,请参考以下文章

jupyter notebook 怎么跑pyspark

Jupyter Notebook从同一目录中的python文件导入类

Spark Read.json 找不到文件

如何一次运行多个 Spark 2.0 实例(在多个 Jupyter Notebook 中)?

pyspark delta-lake 元存储

Apache Spark 启动多个 SparkContext 实例