我可以在数据帧上应用 AWS FindMatch 转换吗?如果是,那么如何
Posted
技术标签:
【中文标题】我可以在数据帧上应用 AWS FindMatch 转换吗?如果是,那么如何【英文标题】:Can I apply AWS FindMatch transform on dataframe ? If yes then how 【发布时间】:2020-05-23 13:20:51 【问题描述】:我想知道是否可以在 Spark 数据帧上应用 AWS Glue 中的 FindMatch ml 转换。目前我可以在动态框架上使用它。如果我想在动态帧上使用 findmatch 变换,下面是语法。
<output DynamicFrame on which the ml transform has been applied> =
FindMatches.apply(frame = <Input DynamicFrame>, transformId = <transformation
id of the findmatch ml transform created separately>)
我尝试使用数据框代替输入动态框,但当我运行 Glue 作业时,它失败了。错误显示如下
“属性错误:‘DataFrame’对象没有属性‘glue_ctx’”
下面是我尝试使用数据框的代码
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglueml.transforms import FindMatches
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "hospitality", table_name =
"personinputdata", transformation_ctx = "datasource0")
df0 = datasource0.toDF()
resolvechoice1 = ResolveChoice.apply(frame = datasource0, choice = "MATCH_CATALOG", database =
"hospitality", table_name = "personinputdata", transformation_ctx = "resolvechoice1")
findmatchdf = FindMatches.apply(frame = df0, transformId = "tfm-
01cc9b02c93640cfc7ce5ea91745e24258cb2e01")
findmatchdf.show()
下面是我尝试使用动态帧而不是数据帧时的代码,并且代码有效。
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
from awsglueml.transforms import FindMatches
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "hospitality", table_name =
"patientinputdata", transformation_ctx = "datasource0")
resolvechoice1 = ResolveChoice.apply(frame = datasource0, choice = "MATCH_CATALOG", database =
"hospitality", table_name = "patientinputdata", transformation_ctx = "resolvechoice1")
findmatches2 = FindMatches.apply(frame = resolvechoice1, transformId = "tfm-
0cadd1e6d2da40d7c18db7836e92be93833b6019", transformation_ctx = "findmatches2")
我尝试在线搜索是否可以找到 FindMatch ml 转换的代码,但在任何地方都找不到。
【问题讨论】:
【参考方案1】:FindMatch 仅适用于您已经知道的动态帧... 因此,您可以随时将 spark df 转换为动态帧
from awsglue.dynamicframe import DynamicFrame
glueContext = GlueContext(SparkContext.getOrCreate())
Dyf0 = DynamicFrame.fromDF(df0, glueContext, "anyname")
然后根据需要运行 FindMatch。
【讨论】:
以上是关于我可以在数据帧上应用 AWS FindMatch 转换吗?如果是,那么如何的主要内容,如果未能解决你的问题,请参考以下文章