使用 databricks-connect 的 Azure 数据块连接

Posted

技术标签:

【中文标题】使用 databricks-connect 的 Azure 数据块连接【英文标题】:Azure Data brick connection using databricks-connect 【发布时间】:2020-06-18 03:44:47 【问题描述】:

我正在关注 https://docs.databricks.com/dev-tools/databricks-connect.html 以连接 azure databricks

 #creating environment dbconnect
 (base) C:\>conda create --name dbconnect python=3.7
 (base) C:\>conda activate dbconnect

 (dbconnect) C:\>pip install -U databricks-connect==6.5
 (dbconnect) C:\>databricks-connect configure

提供配置后我运行 databrick-connect test ,我得到以下异常 raise Exception("Java 网关进程在发送其端口号之前退出") 例外:Java 网关进程在发送其端口号之前退出

如何解决此问题

 (dbconnect) C:\>databricks-connect test
 * PySpark is installed at c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark
 * Checking SPARK_HOME
 * Checking java version
 Picked up _JAVA_OPTIONS: -Djavax.net.ssl.trustStore=C:\Windows\Sun\Java\Deployment\trusted.certs
 openjdk version "11" 2018-09-25  
 OpenJDK Runtime Environment 18.9 (build 11+28)
 OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
 WARNING: Java versions >8 are not supported by this SDK
 * Skipping scala command test on Windows
 * Testing python command
 Picked up _JAVA_OPTIONS: -Djavax.net.ssl.trustStore=C:\Windows\Sun\Java\Deployment\trusted.certs
 Picked up _JAVA_OPTIONS: -Djavax.net.ssl.trustStore=C:\Windows\Sun\Java\Deployment\trusted.certs
 WARNING: An illegal reflective access operation has occurred
 WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
  (file:/C:/Anaconda3/envs/dbconnect/Lib/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.6- 
  SNAPSHOT.jar) to method java.nio.Bits.unaligned()
  WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
  WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access 
  operations
  WARNING: All illegal access operations will be denied in a future release
  Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2666)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2666)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2666)
    at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
    at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
    at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
    at scala.Option.map(Option.scala:146)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
 Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3319)
    at java.base/java.lang.String.substring(String.java:1874)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:52)
    ... 25 more
 Traceback (most recent call last):
 File "c:\anaconda3\envs\dbconnect\lib\runpy.py", line 193, in _run_module_as_main
 "__main__", mod_spec)
 File "c:\anaconda3\envs\dbconnect\lib\runpy.py", line 85, in _run_code
 exec(code, run_globals)
 File "C:\Anaconda3\envs\dbconnect\Scripts\databricks-connect.exe\__main__.py", line 7, in <module>
 File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\databricks_connect.py", line 262, in 
main
test()
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\databricks_connect.py", line 231, in test
spark = SparkSession.builder.getOrCreate()
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\sql\session.py", line 185, in getOrCreate
sc = SparkContext.getOrCreate(sparkConf)
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\context.py", line 372, in getOrCreate
SparkContext(conf=conf or SparkConf())
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\context.py", line 133, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\context.py", line 321, in 
_ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\java_gateway.py", line 46, in 
launch_gateway
return _launch_gateway(conf)
File "c:\anaconda3\envs\dbconnect\lib\site-packages\pyspark\java_gateway.py", line 108, in 
_launch_gateway
raise Exception("Java gateway process exited before sending its port number")
Exception: Java gateway process exited before sending its port number

【问题讨论】:

【参考方案1】:

openjdk 版本“11” 2018-09-25 OpenJDK 运行时环境 18.9(内部版本 11+28) OpenJDK 64-Bit Server VM 18.9(build 11+28,混合模式)

安装 Java 8。不支持 11。

还要检查端口号。在 Azure 上可能应该是 8787。

可能还有其他问题,但我会先解决这个问题。

【讨论】:

我也改了8787端口 正如我所说 - 不支持 Java 11。

以上是关于使用 databricks-connect 的 Azure 数据块连接的主要内容,如果未能解决你的问题,请参考以下文章

使用 databricks-connect 调试运行另一个笔记本的笔记本

Databricks-Connect 还返回找不到多个 python 文件作业的模块

databricks-connect 无法连接到 Databricks 集群运行时 8.4

Databricks Connect:无法连接到 azure 上的远程集群,命令:“databricks-connect test”停止

Databricks 连接 java.lang.ClassNotFoundException

如何从 PyCharm 连接 Databricks 社区版集群