[解决办法] Invalid PythonUDF <lambda;(), requires attributes from more than one child.
Posted Sinsa_SI
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了[解决办法] Invalid PythonUDF <lambda;(), requires attributes from more than one child.相关的知识,希望对你有一定的参考价值。
[解决办法] Invalid PythonUDF (), requires attributes from more than one child.
报题中的错误,解决办法:在过滤过程前 加 df.cache()
(这里的 df 为过滤的 DataFrame)
The sequence of steps that causes this are:
join two dataframes A and B > make a udf that uses one column from A and another from B > filter on column produced by udf > java.lang.RuntimeException: Invalid PythonUDF <lambda>(b#1L, c#6L), requires attributes from more than one child.
Here are some minimum steps to reproduce this issue in pyspark
from pyspark.sql import types
from pyspark.sql import functions as F
df1 = sqlCtx.createDataFrame([types.Row(a=1, b=2), types.Row(a=1, b=4)])
df2 = sqlCtx.createDataFrame([types.Row(a=1, c=12)])
joined = df1.join(df2, df1['a'] == df2['a'])
extra = joined.withColumn('sum', F.udf(lambda a,b : a+b, types.IntegerType())(joined['b'], joined['c']))
filtered = extra.where(extra['sum'] < F.lit(10)).collect()
doing extra.cache() before the filtering will fix the issue but obviously isn’t a solution.
参考资料:https://issues.apache.org/jira/browse/SPARK-18589
以上是关于[解决办法] Invalid PythonUDF <lambda;(), requires attributes from more than one child.的主要内容,如果未能解决你的问题,请参考以下文章
python定义接口继承类invalid syntax解决办法
Invalid AABB inAABB UnityEngine.Canvas:SendWillRenderCanvases()的解决办法
解决办法:Invalid Gradle JDK configuration found
JSON parse error: Invalid UTF-8 解决办法系列
Python中ValueError: invalid literal for int() with base 10 的实用解决办法