[解决办法] Invalid PythonUDF <lambda;(), requires attributes from more than one child.

Posted Sinsa_SI

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[解决办法] Invalid PythonUDF <lambda;(), requires attributes from more than one child.相关的知识,希望对你有一定的参考价值。

[解决办法] Invalid PythonUDF (), requires attributes from more than one child.

报题中的错误,解决办法:在过滤过程前 加 df.cache() (这里的 df 为过滤的 DataFrame)

The sequence of steps that causes this are:

join two dataframes A and B > make a udf that uses one column from A and another from B > filter on column produced by udf > java.lang.RuntimeException: Invalid PythonUDF <lambda>(b#1L, c#6L), requires attributes from more than one child.

Here are some minimum steps to reproduce this issue in pyspark

from pyspark.sql import types
from pyspark.sql import functions as F
df1 = sqlCtx.createDataFrame([types.Row(a=1, b=2), types.Row(a=1, b=4)])
df2 = sqlCtx.createDataFrame([types.Row(a=1, c=12)])
joined = df1.join(df2, df1['a'] == df2['a'])
extra = joined.withColumn('sum', F.udf(lambda a,b : a+b, types.IntegerType())(joined['b'], joined['c']))
filtered = extra.where(extra['sum'] < F.lit(10)).collect()

doing extra.cache() before the filtering will fix the issue but obviously isn’t a solution.

参考资料:https://issues.apache.org/jira/browse/SPARK-18589

以上是关于[解决办法] Invalid PythonUDF <lambda;(), requires attributes from more than one child.的主要内容,如果未能解决你的问题,请参考以下文章

python定义接口继承类invalid syntax解决办法

Invalid AABB inAABB UnityEngine.Canvas:SendWillRenderCanvases()的解决办法

invalid location of tag 解决办法

解决办法:Invalid Gradle JDK configuration found

JSON parse error: Invalid UTF-8 解决办法系列

Python中ValueError: invalid literal for int() with base 10 的实用解决办法