无法在Databricks中为ADLS Gen2创建安装点
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了无法在Databricks中为ADLS Gen2创建安装点相关的知识,希望对你有一定的参考价值。
我们正在尝试通过服务主体创建从Azure Databricks到ADLS Gen2的安装点。服务主体具有适当的资源级别和数据级别访问权限。尽管我们已确认可以通过访问密钥访问ADLS Gen2,但尚未创建安装点。已使用Azure Databricks VNet注入。
挂载点失败,并出现非描述性错误。有一个防火墙正在审核来自Databricks的所有流量,因此我们的假设是,安装点(OAuth服务或Azure AD API)所需的某些东西被阻止了。我们已经确认Databricks可以连接到文件系统,但是使用服务主体创建挂载点失败。未知HTTP Data或Azure Databricks必须能够联系哪些服务才能创建安装点。我们相信,解锁那些服务端点将使创建成为可能。当前,仅允许login.microsoftonline.com。
# Mount point for ADLS Gen2 via. Service principal
configs = "fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "XXXXXX",
"fs.azure.account.oauth2.client.secret": dbutils.secrets.get(scope = "XXXX-scope", key = "XXXX-key"),
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/XXXXX/oauth2/token"
dbutils.fs.mount(
source = "abfss://filesystem@storageaccount.dfs.core.windows.net/",
mount_point = "/mnt/XXXX",
extra_configs = configs)
Expect the mount point to be successfully created. Error below:
ExecutionError: An error occurred while calling o220.mount.
: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:666)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.protocol.https.HttpsClient.<init>(HttpsClient.java:264)
at sun.net.www.protocol.https.HttpsClient.New(HttpsClient.java:367)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.getNewHttpClient(AbstractDelegateHttpsURLConnection.java:191)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:177)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:259)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenSingleCall(AzureADAuthenticator.java:256)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenCall(AzureADAuthenticator.java:211)
at shaded.databricks.v20180920_b33d810.org.apache.hadoop.fs.azurebfs.oauth2.AzureADAuthenticator.getTokenUsingClientCreds(AzureADAuthenticator.java:94)
at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureOAuth(DBUtilsCore.scala:477)
at com.databricks.backend.daemon.dbutils.DBUtilsCore.verifyAzureFileSystem(DBUtilsCore.scala:488)
at com.databricks.backend.daemon.dbutils.DBUtilsCore.mount(DBUtilsCore.scala:446)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
请确保您提供了有效的服务主体详细信息,例如:(appId,密码,租户)。
[Azure Data Lake Storage Gen2安装配置:
configs = "fs.azure.account.auth.type": "OAuth",
"fs.azure.account.oauth.provider.type": "org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider",
"fs.azure.account.oauth2.client.id": "<appId>",
"fs.azure.account.oauth2.client.secret": "<password>",
"fs.azure.account.oauth2.client.endpoint": "https://login.microsoftonline.com/<tenant>/oauth2/token",
"fs.azure.createRemoteFileSystemDuringInitialization": "true"
dbutils.fs.mount(
source = "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/folder1",
mount_point = "/mnt/flightdata",
extra_configs = configs)
访问文件系统中的文件,就像它们是本地文件一样:
参考: Tutorial: Access Data Lake Storage Gen2 data with Azure Databricks using Spark
希望这会有所帮助。
以上是关于无法在Databricks中为ADLS Gen2创建安装点的主要内容,如果未能解决你的问题,请参考以下文章
Azure databricks - 无法使用来自 datalake 存储 gen2 服务的 spark 作业读取 .csv 文件
如何获取 ADLS Gen2 文件的最后修改日期并将其保存到 python 中的 csv
由于缺少 x-ms-blob-type,在 ADLS Gen2 中创建路径失败?