pickle.loads 给出“模块”对象在 Pyspark Pandas Udf 中没有属性“<ClassName>”

Posted

技术标签:

【中文标题】pickle.loads 给出“模块”对象在 Pyspark Pandas Udf 中没有属性“<ClassName>”【英文标题】:pickle.loads gives 'module' object has no attribute '<ClassName>' inside a Pyspark Pandas Udf 【发布时间】:2020-06-03 20:54:21 【问题描述】:

我正在尝试在 PySpark Pandas udf 中腌制和取消腌制一个类实例。酸洗在 udf 之外工作得很好:

class ExampleModel:
    pass

clf = ExampleModel(args)
pickled_val = base64.b64encode(pickle.dumps(clf))
clf2 = pickle.loads(base64.b64decode(pickled_val))
print(clf2)
# <__main__.ExampleModel instance at 0x7f04d7444780>

但是,在 pandas udf 中,我可以访问 ExampleModel 类,但无法解开字符串列。

df = spark_session.createDataFrame(
    [
        (1, pickled_val, '') 
    ],
    ['id', 'txt', 'error'] 
)

@pandas_udf(df.schema, PandasUDFType.GROUPED_MAP)
def example_unpickle(pdf):
    try:
        clf_obj = ExampleModel()
    except Exception as e:
        pdf.loc[:,'error'] = "1:" + str(e)
        return pdf

    try:
        clf3 = pickle.loads(base64.b64decode(pdf.iloc[0,1]))
    except Exception as e:
        pdf.loc[:,'error'] = "2: " + str(e)
        return pdf


df_clf = df\
            .groupby('id')\
            .apply(example_unpickle)

df_clf.show(truncate = False)

给出错误:

AttributeError: 'module' object has no attribute 'ExampleModel'

+---+------------------------------------------------+--------------------------------------------------+
|id |txt                                             |error                                             |
+---+------------------------------------------------+--------------------------------------------------+
|1  |KGlfX21haW5fXwpFeGFtcGxlTW9kZWwKcDAKKGRwMQpiLg==|2: 'module' object has no attribute 'ExampleModel'|
+---+------------------------------------------------+--------------------------------------------------+

【问题讨论】:

【参考方案1】:

解决方案是将该类设为一个单独的文件,并在同一目录中创建一个__init__.py

然后将类导入为:

from ExampleFileName import ExampleModel

【讨论】:

以上是关于pickle.loads 给出“模块”对象在 Pyspark Pandas Udf 中没有属性“<ClassName>”的主要内容,如果未能解决你的问题,请参考以下文章

序列化:pickle 模块

Python常用模块之pickle——对象序列化

无法加载腌制对象

python D22 序列化

python json and pickle

python json and pickle