Azure databricks:通过 API 将 maven 库安装到集群会导致错误(库解析失败。原因:java.lang.RuntimeException)

Posted

技术标签:

【中文标题】Azure databricks:通过 API 将 maven 库安装到集群会导致错误(库解析失败。原因:java.lang.RuntimeException)【英文标题】:Azure databricks: Installing maven libraries to cluster through API causes error (Library resolution failed. Cause: java.lang.RuntimeException) 【发布时间】:2020-03-21 15:39:49 【问题描述】:

我正在尝试通过 python 的 API 将一些 maven 库安装到现有的 azure 数据块的集群/新创建的集群。

集群详情:

Python 3 5.5 LTS(包括 Apache Spark 2.4.3、Scala 2.11) 节点类型:Standard_D3_v2
spark_submit_packages = "org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.4.3," \
                        "com.databricks:spark-redshift_2.11:3.0.0-preview1," \
                        "org.postgresql:postgresql:9.3-1103-jdbc3," \
                        "com.amazonaws:aws-java-sdk:1.11.98," \
                        "com.amazonaws:aws-java-sdk-core:1.11.98," \
                        "com.amazonaws:aws-java-sdk-sns:1.11.98," \
                        "org.apache.hadoop:hadoop-aws:2.7.3," \
                        "com.amazonaws:aws-java-sdk-s3:1.11.98," \
                        "com.databricks:spark-avro_2.11:4.0.0," \
                        "com.microsoft.azure:azure-data-lake-store-sdk:2.0.11," \
                        "org.apache.hadoop:hadoop-azure-datalake:3.0.0-alpha2," \
                        "com.microsoft.azure:azure-storage:3.1.0," \
                        "org.apache.hadoop:hadoop-azure:2.7.2"

    install_lib_url = "https://<region>.azuredatabricks.net/api/2.0/libraries/install"
    packages = spark_submit_packages.split(",")
    maven_packages = []
    for pack in packages:
        maven_packages.append("maven": "coordinates": pack)

    headers = "Authorization": "Bearer ".format(TOKEN)
    headers['Content-type'] = 'application/json'

    data = 
        "cluster_id": cluster_id,
        "libraries": maven_packages
    
    
    res = requests.post(install_lib_url, headers=headers, data=json.dumps(data))
    _response = res.json()
    print(json.dumps(_response))

响应为空 json,符合预期。 但是有时候这个api调用会导致UI出现如下错误,库安装失败,

Library resolution failed. Cause: java.lang.RuntimeException: commons-httpclient:commons-httpclient download failed.
    at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$5(MavenLibraryResolver.scala:253)
    at scala.collection.MapLike.getOrElse(MapLike.scala:131)
    at scala.collection.MapLike.getOrElse$(MapLike.scala:129)
    at scala.collection.AbstractMap.getOrElse(Map.scala:63)
    at com.databricks.libraries.server.MavenInstaller.$anonfun$resolveDependencyPaths$4(MavenLibraryResolver.scala:253)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
    at scala.collection.mutable.ArraySeq.foreach(ArraySeq.scala:75)
    at scala.collection.TraversableLike.map(TraversableLike.scala:238)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
    at scala.collection.AbstractTraversable.map(Traversable.scala:108)
    at com.databricks.libraries.server.MavenInstaller.resolveDependencyPaths(MavenLibraryResolver.scala:249)
    at com.databricks.libraries.server.MavenInstaller.doDownloadMavenPackages(MavenLibraryResolver.scala:455)
    at com.databricks.libraries.server.MavenInstaller.$anonfun$downloadMavenPackages$2(MavenLibraryResolver.scala:381)
    at com.databricks.backend.common.util.FileUtils$.withTemporaryDirectory(FileUtils.scala:431)
    at com.databricks.libraries.server.MavenInstaller.$anonfun$downloadMavenPackages$1(MavenLibraryResolver.scala:380)
    at com.databricks.logging.UsageLogging.$anonfun$recordOperation$4(UsageLogging.scala:417)
    at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231)
    at com.databricks.libraries.server.MavenInstaller.withAttributionContext(MavenLibraryResolver.scala:57)
    at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:276)
    at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:269)
    at com.databricks.libraries.server.MavenInstaller.withAttributionTags(MavenLibraryResolver.scala:57)
    at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:398)
    at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:337)
    at com.databricks.libraries.server.MavenInstaller.recordOperation(MavenLibraryResolver.scala:57)
    at com.databricks.libraries.server.MavenInstaller.downloadMavenPackages(MavenLibraryResolver.scala:379)
    at com.databricks.libraries.server.MavenInstaller.downloadMavenPackagesWithRetry(MavenLibraryResolver.scala:137)
    at com.databricks.libraries.server.MavenInstaller.resolveMavenPackages(MavenLibraryResolver.scala:113)
    at com.databricks.libraries.server.MavenLibraryResolver.resolve(MavenLibraryResolver.scala:44)
    at com.databricks.libraries.server.ManagedLibraryManager$GenericManagedLibraryResolver.resolve(ManagedLibraryManager.scala:263)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl.$anonfun$resolvePrimitives$1(ManagedLibraryManagerImpl.scala:193)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl.$anonfun$resolvePrimitives$1$adapted(ManagedLibraryManagerImpl.scala:188)
    at scala.collection.Iterator.foreach(Iterator.scala:941)
    at scala.collection.Iterator.foreach$(Iterator.scala:941)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl.resolvePrimitives(ManagedLibraryManagerImpl.scala:188)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl$ClusterStatus.installLibs(ManagedLibraryManagerImpl.scala:772)
    at com.databricks.libraries.server.ManagedLibraryManagerImpl$InstallLibTask$1.run(ManagedLibraryManagerImpl.scala:473)
    at com.databricks.threading.NamedExecutor$$anon$1.$anonfun$run$1(NamedExecutor.scala:317)
    at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
    at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:239)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:234)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:231)
    at com.databricks.threading.NamedExecutor.withAttributionContext(NamedExecutor.scala:256)
    at com.databricks.threading.NamedExecutor$$anon$1.run(NamedExecutor.scala:317)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

是因为在单个 API 中安装了多个 maven 库吗? (但是我们需要给 API 一个列表:|)

编辑:在重新启动集群时也会出现此问题。假设我已经手动将大约 10 个 maven 库安装到集群中。所有安装都成功。但是当我重新启动集群时,即使这些成功的安装也会失败。

【问题讨论】:

【参考方案1】:

从 Azure 支持团队得到以下回复:

似乎某个特定的maven有问题 jar(org.apache.hadoop:hadoop-azure-datalake:3.0.0-alpha2)

解决方法: 1.从maven仓库下载jar。 2. 上传到 dbfs。 3. 使用 dbfs 的 jar 来创建库。

【讨论】:

以上是关于Azure databricks:通过 API 将 maven 库安装到集群会导致错误(库解析失败。原因:java.lang.RuntimeException)的主要内容,如果未能解决你的问题,请参考以下文章

如何使用 Azure databricks 通过 ADLS gen 2 中的多个工作表读取和写入 excel 数据

如何在 Python 中从 Azure Databricks 插入 Azure SQL 数据库

从 Azure Databricks 将数据写入 Azure Blob 存储

Azure Databricks 通过服务主体访问 Azure Data Lake Storage Gen2

将 DataBricks 连接到 Azure Blob 存储

将 Azure Databricks 增量表迁移到 Azure Synapse SQL 池