使用 Databricks 连接到 AWS Postgres
Posted
技术标签:
【中文标题】使用 Databricks 连接到 AWS Postgres【英文标题】:Connect to AWS Postgres using Databricks 【发布时间】:2019-05-02 15:35:00 【问题描述】:从 Azure Databricks 连接到 AWS Postgres 时遇到问题,我是 Azure 的新手,下面是我用来连接到 Postgres 的代码,但不知何故它抛出了一个错误 错误:org.postgresql.util.PSQLException:连接尝试失败。
代码:
jdbc_url="jdbc:postgresql://postgreshost:5432/db?user=&password=&ssl=true.format(username,password)"
pushdown_query = "(select * from test limit 10) emp_alias"
df = spark.read.jdbc(url=jdbc_url, table="test")
display(df)
第二种方法:
df = spark.read \
.format("jdbc") \
.option("url", "jdbc:postgresql://postgreshost:5432/db?user=user&password=password") \
.option("dbtable", "test") \
.load()
我错过了什么吗?还是我应该在执行前执行任何步骤?
使用 Scala 记录日志:
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:275)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:194)
at org.postgresql.Driver.makeConnection(Driver.java:450)
at org.postgresql.Driver.connect(Driver.java:252)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:55)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:298)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:279)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:202)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3334328075204474:8)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3334328075204474:51)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw$$iw.<init>(command-3334328075204474:53)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw.<init>(command-3334328075204474:55)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw.<init>(command-3334328075204474:57)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw.<init>(command-3334328075204474:59)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read.<init>(command-3334328075204474:61)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$.<init>(command-3334328075204474:65)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$.<clinit>(command-3334328075204474)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$eval$.$print$lzycompute(<notebook>:7)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$eval$.$print(<notebook>:6)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:199)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:587)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:542)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$7.apply(DriverLocal.scala:324)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$7.apply(DriverLocal.scala:304)
at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:45)
at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:45)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:304)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:589)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:589)
at scala.util.Try$.apply(Try.scala:192)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:584)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:475)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:542)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:381)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:328)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:215)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.postgresql.core.PGStream.<init>(PGStream.java:68)
at org.postgresql.core.v3.ConnectionFactoryImpl.openConnectionImpl(ConnectionFactoryImpl.java:144)
at org.postgresql.core.ConnectionFactory.openConnection(ConnectionFactory.java:49)
at org.postgresql.jdbc.PgConnection.<init>(PgConnection.java:194)
at org.postgresql.Driver.makeConnection(Driver.java:450)
at org.postgresql.Driver.connect(Driver.java:252)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:64)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anonfun$createConnectionFactory$1.apply(JdbcUtils.scala:55)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:56)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:210)
at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:346)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:298)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:279)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:202)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-3334328075204474:8)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-3334328075204474:51)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw$$iw.<init>(command-3334328075204474:53)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw$$iw.<init>(command-3334328075204474:55)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw$$iw.<init>(command-3334328075204474:57)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$$iw.<init>(command-3334328075204474:59)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read.<init>(command-3334328075204474:61)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$.<init>(command-3334328075204474:65)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$read$.<clinit>(command-3334328075204474)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$eval$.$print$lzycompute(<notebook>:7)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$eval$.$print(<notebook>:6)
at lined9bdaa60f31e4f44a370d2ec7ae9793627.$eval.$print(<notebook>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at com.databricks.backend.daemon.driver.DriverILoop.execute(DriverILoop.scala:199)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply$mcV$sp(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.ScalaDriverLocal$$anonfun$repl$1.apply(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExitInternal$.trapExit(DriverLocal.scala:587)
at com.databricks.backend.daemon.driver.DriverLocal$TrapExit$.apply(DriverLocal.scala:542)
at com.databricks.backend.daemon.driver.ScalaDriverLocal.repl(ScalaDriverLocal.scala:189)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$7.apply(DriverLocal.scala:324)
at com.databricks.backend.daemon.driver.DriverLocal$$anonfun$execute$7.apply(DriverLocal.scala:304)
at com.databricks.logging.UsageLogging$$anonfun$withAttributionContext$1.apply(UsageLogging.scala:235)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at com.databricks.logging.UsageLogging$class.withAttributionContext(UsageLogging.scala:230)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:45)
at com.databricks.logging.UsageLogging$class.withAttributionTags(UsageLogging.scala:268)
at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:45)
at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:304)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:589)
at com.databricks.backend.daemon.driver.DriverWrapper$$anonfun$tryExecutingCommand$2.apply(DriverWrapper.scala:589)
at scala.util.Try$.apply(Try.scala:192)
at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:584)
at com.databricks.backend.daemon.driver.DriverWrapper.getCommandOutputAndError(DriverWrapper.scala:475)
at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:542)
at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:381)
at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:328)
at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:215)
at java.lang.Thread.run(Thread.java:748)
【问题讨论】:
【参考方案1】:我从未在连接 URL 上提供用户名和密码,所以我不确定它是否有效。通常,它被指定为额外的参数。检查 Spark Docs,它是这样指定的(在 Scala 中):
val jdbcDF = spark.read
.format("jdbc")
.option("url", "jdbc:postgresql:dbserver")
.option("dbtable", "schema.tablename")
.option("user", "username")
.option("password", "password")
.load()
参考:https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html
【讨论】:
我也试过这个,它抛出了同样的错误 - org.postgresql.util.PSQLException: The connection attempt failed. 能否提供完整的堆栈跟踪信息? 添加在问题底部【参考方案2】:这是公司内部问题,与代码无关
【讨论】:
以上是关于使用 Databricks 连接到 AWS Postgres的主要内容,如果未能解决你的问题,请参考以下文章
将 Databricks 集群与本地计算机 (AWS) 连接
使用服务主体从 DataBricks 连接到 Synapse
如何从 QlikView 连接到 Databricks Delta 表?
连接到 postgresql:dbserver db 通过 JDBC 连接到 Databricks 时连接被拒绝