pickle.loads 给出“模块”对象在 Pyspark Pandas Udf 中没有属性“<ClassName>”
Posted
技术标签:
【中文标题】pickle.loads 给出“模块”对象在 Pyspark Pandas Udf 中没有属性“<ClassName>”【英文标题】:pickle.loads gives 'module' object has no attribute '<ClassName>' inside a Pyspark Pandas Udf 【发布时间】:2020-06-03 20:54:21 【问题描述】:我正在尝试在 PySpark Pandas udf 中腌制和取消腌制一个类实例。酸洗在 udf 之外工作得很好:
class ExampleModel:
pass
clf = ExampleModel(args)
pickled_val = base64.b64encode(pickle.dumps(clf))
clf2 = pickle.loads(base64.b64decode(pickled_val))
print(clf2)
# <__main__.ExampleModel instance at 0x7f04d7444780>
但是,在 pandas udf 中,我可以访问 ExampleModel 类,但无法解开字符串列。
df = spark_session.createDataFrame(
[
(1, pickled_val, '')
],
['id', 'txt', 'error']
)
@pandas_udf(df.schema, PandasUDFType.GROUPED_MAP)
def example_unpickle(pdf):
try:
clf_obj = ExampleModel()
except Exception as e:
pdf.loc[:,'error'] = "1:" + str(e)
return pdf
try:
clf3 = pickle.loads(base64.b64decode(pdf.iloc[0,1]))
except Exception as e:
pdf.loc[:,'error'] = "2: " + str(e)
return pdf
df_clf = df\
.groupby('id')\
.apply(example_unpickle)
df_clf.show(truncate = False)
给出错误:
AttributeError: 'module' object has no attribute 'ExampleModel'
+---+------------------------------------------------+--------------------------------------------------+
|id |txt |error |
+---+------------------------------------------------+--------------------------------------------------+
|1 |KGlfX21haW5fXwpFeGFtcGxlTW9kZWwKcDAKKGRwMQpiLg==|2: 'module' object has no attribute 'ExampleModel'|
+---+------------------------------------------------+--------------------------------------------------+
【问题讨论】:
【参考方案1】:解决方案是将该类设为一个单独的文件,并在同一目录中创建一个__init__.py
。
然后将类导入为:
from ExampleFileName import ExampleModel
【讨论】:
以上是关于pickle.loads 给出“模块”对象在 Pyspark Pandas Udf 中没有属性“<ClassName>”的主要内容,如果未能解决你的问题,请参考以下文章