Databricks 连接 java.lang.ClassNotFoundException
Posted
技术标签:
【中文标题】Databricks 连接 java.lang.ClassNotFoundException【英文标题】:Databricks Connect java.lang.ClassNotFoundException 【发布时间】:2021-11-16 09:54:35 【问题描述】:我在 Azure Databricks 上将我们的 databricks 集群更新为 DBR 9.1 LTS,但是当我尝试使用 Databricks-connect 在 VS Code 中运行它时,我经常使用的一个包给我一个错误,而以前的集群却没有.以前的集群在 DBR 8.3 上运行。我也更新了软件包以与新的 DBR 集群兼容。 maven 坐标是com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.3.0. 当我直接在 Databricks 笔记本中运行以下脚本时,它可以工作,但是当我使用 Databricks-connect 运行它时,出现以下错误。
# com.azure.cosmos.spark:azure-cosmos-spark_3-1_2-12:4.3.0
from pyspark.sql.types import StringType, StructField, StructType
from pyspark.sql import SparkSession
from pyspark.sql import functions as F
from pyspark.dbutils import DBUtils
spark = SparkSession.builder.appName("local").getOrCreate()
dbutils = DBUtils(spark)
cosmosEndpoint = ######################
cosmosMasterKey = ######################
cosmosDatabaseName = ######################
cosmosContainerName = "test"
cfg =
"spark.cosmos.accountEndpoint": cosmosEndpoint,
"spark.cosmos.accountKey": cosmosMasterKey,
"spark.cosmos.database": cosmosDatabaseName,
"spark.cosmos.container": cosmosContainerName,
# Configure Catalog Api to be used
spark.conf.set("spark.sql.catalog.cosmosCatalog", "com.azure.cosmos.spark.CosmosCatalog")
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountEndpoint", cosmosEndpoint)
spark.conf.set("spark.sql.catalog.cosmosCatalog.spark.cosmos.accountKey", cosmosMasterKey)
df = (
spark
.read
.format("cosmos.oltp")
.options(**cfg)
.load()
)
df.show()
我在使用 Databricks 连接的 VS 代码中遇到的错误如下:
Exception has occurred: Py4JJavaError
An error occurred while calling o35.load.
: java.lang.ClassNotFoundException: Failed to find data source: cosmos.oltp. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:765)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:819)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:367)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:101)
at com.databricks.service.SparkServiceRPCHandler$$anon$1.call(SparkServiceRPCHandler.scala:80)
at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4724)
at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3522)
at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2315)
at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2278)
at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2193)
at com.google.common.cache.LocalCache.get(LocalCache.java:3932)
at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4721)
at com.databricks.service.SparkServiceRPCHandler$.getOrLoadAnonymousRelation(SparkServiceRPCHandler.scala:80)
at com.databricks.service.SparkServiceRPCHandler.execute0(SparkServiceRPCHandler.scala:715)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC0$1(SparkServiceRPCHandler.scala:478)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC0(SparkServiceRPCHandler.scala:370)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:321)
at com.databricks.service.SparkServiceRPCHandler$$anon$2.call(SparkServiceRPCHandler.scala:307)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at com.databricks.service.SparkServiceRPCHandler.$anonfun$executeRPC$1(SparkServiceRPCHandler.scala:357)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.service.SparkServiceRPCHandler.executeRPC(SparkServiceRPCHandler.scala:334)
at com.databricks.service.SparkServiceRPCServlet.doPost(SparkServiceRPCServer.scala:153)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:550)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.eclipse.jetty.server.Server.handle(Server.java:516)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:137)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: cosmos.oltp.DefaultSource
at java.lang.ClassLoader.findClass(ClassLoader.java:524)
at org.apache.spark.util.ParentClassLoader.findClass(ParentClassLoader.java:35)
at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
at org.apache.spark.util.ParentClassLoader.loadClass(ParentClassLoader.java:40)
at org.apache.spark.util.ChildFirstURLClassLoader.loadClass(ChildFirstURLClassLoader.java:48)
at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$5(DataSource.scala:739)
at scala.util.Try$.apply(Try.scala:213)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$lookupDataSource$4(DataSource.scala:739)
at scala.util.Failure.orElse(Try.scala:224)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:739)
... 46 more
我已将包的jar文件添加到以下目录:.venv\lib\site-packages\pyspark\jars
【问题讨论】:
嗨,路易斯!你找到解决方案了吗?我们遇到了同样的问题! 您是否更新了 databricks-connect 包以匹配您的集群的 DBR 运行时版本?笔记本电脑和集群上的 cosmos jar 文件是否都更新了? 您好,我没有找到解决方案。我的 DBR 运行时 9.1.4 与我的集群版本 9.1 LTS 匹配,我将 azure-cosmos-spark_3-1_2-12-4.5.3.jar 添加到我的 \.venv\Lib\site-packages\pyspark\jars 文件夹中。 【参考方案1】:问题似乎已经解决了,因为它似乎又可以工作了。
【讨论】:
应通过编辑问题提供编辑或附加信息。如果问题无关紧要,您可以将其删除。以上是关于Databricks 连接 java.lang.ClassNotFoundException的主要内容,如果未能解决你的问题,请参考以下文章
Databricks 连接 java.lang.ClassNotFoundException
使用 databricks-connect 的 Azure 数据块连接
通过 Simba JDBC 的 Databricks Spark 连接问题
在 AWS 上为 Databricks 和 Snowflake 使用 Spark 连接器