Py(Spark) udf 给出 PythonException: 'TypeError: 'float' object is not subscriptable

Posted

技术标签:

【中文标题】Py(Spark) udf 给出 PythonException: \'TypeError: \'float\' object is not subscriptable【英文标题】:Py(Spark) udf gives PythonException: 'TypeError: 'float' object is not subscriptablePy(Spark) udf 给出 PythonException: 'TypeError: 'float' object is not subscriptable 【发布时间】:2021-08-19 08:50:06 【问题描述】:

我正在应用用户定义的函数来触发数据框,如下所示,

@udf("double")
def discount_udf (row):
  if ((row['total_order'] == 2) or (row['total_order'] == 3)):
    return 2.50
  elif ((row['total_order'] == 4) or (row['total_order'] == 5)):
    return 1.20
  elif ((row['total_order'] == 6) or (row['total_order'] == 7)):
    return 0.60
  elif ((row['total_order'] == 8) or (row['total_order'] == 9) or (row['total_order'] == 10) or (row['total_order'] == 11)):
    return 0.00
  elif ((row['total_order'] == 12) or (row['total_order'] == 13) or (row['total_order'] == 14) or (row['total_order'] == 15)):
    return -0.20
  elif ((row['total_order'] == 16) or (row['total_order'] == 17) or (row['total_order'] == 18) or (row['total_order'] == 19) or (row['total_order'] == 20) or (row['total_order'] == 21) or (row['total_order'] == 22) or (row['total_order'] == 23)):
    return -0.20
  elif ((row['total_order'] == 24) or (row['total_order'] == 25) or (row['total_order'] == 26) or (row['total_order'] == 27) or (row['total_order'] == 28) or (row['total_order'] == 29) or (row['total_order'] == 30) or (row['total_order'] == 31)):
    return -0.40
  else :
    return -0.50

from pyspark.sql.functions import udf
df.withColumn("discount_rate", discount_udf(F.col('total_order')))

但是,这给了我这个错误

错误

PythonException: 'TypeError: 'float' object is not subscriptable', from <command-1374686736879751>, line 3. Full traceback below:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 100.0 failed 4 times, most recent failure: Lost task 0.3 in stage 100.0 (TID 164) (10.139.64.4 executor 0): org.apache.spark.api.python.PythonException: 'TypeError: 'float' object is not subscriptable', from <command-1374686736879751>, line 3. Full traceback below:
Traceback (most recent call last):
  File "<command-1374686736879751>", line 3, in discount_udf
TypeError: 'float' object is not subscriptable

我已经尝试了与 `pandas' 数据框相同的功能,它对我来说效果很好。

df['discount_rate_1'] = df.apply(discount_udf, axis=1)

任何人都可以帮助/建议这里有什么问题吗?

提前致谢

【问题讨论】:

【参考方案1】:

我改开头,剩下的交给你:

@udf("double")
def discount_udf (total_order):
  if ((total_order == 2) or (total_order == 3)):

您只需将每个row['total_order'] 替换为total_order


我建议您也进行此更改:

if ((total_order == 2) or (total_order == 3)):
# TO BECOME
if total_order  in (2,3):
# OR ALSO POSSIBLE
if 2 <= total_order <= 3: # It is not exactly the same but should work if you only have integer

【讨论】:

以上是关于Py(Spark) udf 给出 PythonException: 'TypeError: 'float' object is not subscriptable的主要内容,如果未能解决你的问题,请参考以下文章

SyntaxError: Non-UTF-8 code starting with 'xbb' in file D:流畅学pythonex32.py on line 1, but no

我们如何在 Spark-Scala 和 Cataloging UDF 中注册一个函数以及其他函数?

spark 能执行udf 不能执行udaf,啥原因

Pandas UDF 函数中无法识别的函数

python 用IDLE能运行 用shell不能运行

spark自定义UDF为啥参数最多21个