运行 AWS 胶水工作室 ETL 脚本时出现 ARN 角色授权错误
Posted
技术标签:
【中文标题】运行 AWS 胶水工作室 ETL 脚本时出现 ARN 角色授权错误【英文标题】:ARN role Authorisation Error while running AWS glue studio ETL script 【发布时间】:2021-08-26 02:03:10 【问题描述】:py4j.protocol.Py4JJavaError: An error occurred while calling o85.getDynamicFrame.
: java.sql.SQLException: Exception thrown in awaitResult:
at com.databricks.spark.redshift.JDBCWrapper.com$databricks$spark$redshift$JDBCWrapper$$executeInterruptibly(RedshiftJDBCWrapper.scala:133)
at com.databricks.spark.redshift.JDBCWrapper.executeInterruptibly(RedshiftJDBCWrapper.scala:109)
at com.databricks.spark.redshift.RedshiftRelation.buildScan(RedshiftRelation.scala:138)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$10.apply(DataSourceStrategy.scala:293)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:326)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy$$anonfun$pruneFilterProject$1.apply(DataSourceStrategy.scala:325)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProjectRaw(DataSourceStrategy.scala:381)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.pruneFilterProject(DataSourceStrategy.scala:321)
at org.apache.spark.sql.execution.datasources.DataSourceStrategy.apply(DataSourceStrategy.scala:289)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:63)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:78)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2$$anonfun$apply$2.apply(QueryPlanner.scala:75)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.TraversableOnce$$anonfun$foldLeft$1.apply(TraversableOnce.scala:157)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.TraversableOnce$class.foldLeft(TraversableOnce.scala:157)
at scala.collection.AbstractIterator.foldLeft(Iterator.scala:1334)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:75)
at org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$2.apply(QueryPlanner.scala:67)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at org.apache.spark.sql.catalyst.planning.QueryPlanner.plan(QueryPlanner.scala:93)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan$lzycompute(QueryExecution.scala:72)
at org.apache.spark.sql.execution.QueryExecution.sparkPlan(QueryExecution.scala:68)
at org.apache.spark.sql.execution.QueryExecution.executedPlan$lzycompute(QueryExecution.scala:77)
at org.apache.spark.sql.execution.QueryExecution.executedPlan(QueryExecution.scala:77)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3359)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2544)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2758)
at com.amazonaws.services.glue.JDBCDataSource.getLastRow(DataSource.scala:944)
at com.amazonaws.services.glue.JDBCDataSource.getJdbcJobBookmark(DataSource.scala:805)
at com.amazonaws.services.glue.JDBCDataSource.getDynamicFrame(DataSource.scala:829)
at com.amazonaws.services.glue.DataSource$class.getDynamicFrame(DataSource.scala:94)
at com.amazonaws.services.glue.SparkSQLDataSource.getDynamicFrame(DataSource.scala:658)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.sql.SQLException: [Amazon](500310) Invalid operation: Not authorized to get credentials of role arn:aws:iam::**********:role/glue_etl_role
Details:
-----------------------------------------------
error: Not authorized to get credentials of role arn:aws:iam::*********:role/glue_etl_role
code: 30000
context:
query: 0
location: xen_aws_credentials_mgr.cpp:391
process: padbmaster
我们运行 AWS 胶水工作室脚本来执行一些连接和重命名操作。连接器和目标是使用 AWS 粘合目录的 Redshift。
最初的错误是 IAM 没有添加到我们添加的 redshift 中。添加 IAM 后,我们收到了这个新错误,上面写着 Not authorized to get credentials。
【问题讨论】:
试试docs.aws.amazon.com/redshift/latest/mgmt/…。这应该解决它 【参考方案1】:AWS Glue 作业需要具有访问数据存储权限的 IAM 角色。确保此角色有权访问您的 Amazon S3 源、目标、临时目录、脚本以及作业使用的任何库。
另一个 IAM 角色与 Redshift 集群相关联(因此集群可以代表您访问其他 AWS 服务,例如,如果您的表链接到 S3 存储桶,您必须授予它权限 AmazonS3ReadOnlyAccess)。
确保我们不会混合两个角色。
【讨论】:
以上是关于运行 AWS 胶水工作室 ETL 脚本时出现 ARN 角色授权错误的主要内容,如果未能解决你的问题,请参考以下文章
aws 胶水中的 catalog_connection 参数是啥?
尝试将胶水表复制到红移时出现“在 awaitResult 中引发的异常:”错误
使用 pyspark aws 胶水时显示 DataFrame