从同一目录启动多个 Jupyter Spark 笔记本时发生 Metastore 错误
Posted
技术标签:
【中文标题】从同一目录启动多个 Jupyter Spark 笔记本时发生 Metastore 错误【英文标题】:Metastore error while launching multiple Jupyter Spark notebooks from same directory 【发布时间】:2017-07-25 18:37:36 【问题描述】:我正在运行带有 Spark (Spark 2.1.0) 的 Jupyter Notebook (Jupyter 1.0.0),并且能够成功运行 Pyspark 代码。但是当我启动两个位于同一目录下的笔记本时,如下所示:
笔记本
|__ 笔记本1
|__ 笔记本 2
并启动 Notebook1,然后启动 Notebook2,Notebook1 启动并成功运行,但由于 Spark Context 启动错误,Notebook2 未成功启动。看起来这与spark metastore
有关。
以下是 Spark 的堆栈跟踪:
Caused by: ERROR XSDB6: Another instance of Derby may have already booted the database XXXXXXXX/notebooks/metastore_db.
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.iapi.error.StandardException.newException(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown Source)
at org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore$6.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.raw.RawStore.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.raw.RawStore.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.store.access.RAMAccessManager.bootServiceModule(Unknown Source)
at org.apache.derby.impl.store.access.RAMAccessManager.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startModule(Unknown Source)
at org.apache.derby.impl.services.monitor.FileMonitor.startModule(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.db.BasicDatabase.bootServiceModule(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.bootStore(Unknown Source)
at org.apache.derby.impl.db.BasicDatabase.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown Source)
at org.apache.derby.impl.services.monitor.TopService.bootModule(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.bootService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startProviderService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.findProviderAndStartService(Unknown Source)
at org.apache.derby.impl.services.monitor.BaseMonitor.startPersistentService(Unknown Source)
at org.apache.derby.iapi.services.monitor.Monitor.startPersistentService(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection$4.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.impl.jdbc.EmbedConnection.startPersistentService(Unknown Source)
【问题讨论】:
【参考方案1】:看起来您的问题可能与 Derby 相关,Sparks 默认数据库用于将其指令存储到自身。 Derby 一次只能由一个实体使用,因此启动第二个 Notebook 会使它感到困惑。我自己遇到了这个问题,而是通过将 Spark 连接到 Postgres 来解决它。
查看此答案,看看它是否适合您: How to run multiple instances of spark 2.0
【讨论】:
以上是关于从同一目录启动多个 Jupyter Spark 笔记本时发生 Metastore 错误的主要内容,如果未能解决你的问题,请参考以下文章
Jupyter Notebook从同一目录中的python文件导入类