TypeError: Invalid argument, not a string or column: [79, -1, -1] of type <class 'list'> colum

Posted 2023-04-13

技术标签:

【中文标题】TypeError: Invalid argument, not a string or column: [79, -1, -1] of type <class \'list\'> column literals use \'lit\' \'array\' \'struct\' or \'create_map\'【英文标题】：TypeError: Invalid argument, not a string or column: [79, -1, -1] of type <class 'list'> column literals use 'lit' 'array' 'struct' or 'create_map'TypeError: Invalid argument, not a string or column: [79, -1, -1] of type <class 'list'> column literals use 'lit' 'array' 'struct' or 'create_map' 【发布时间】：2021-10-11 02:58:00 【问题描述】：

我在 PySpark UDF 中遇到问题，它抛出错误

PythonException: An exception was thrown from a UDF: 'TypeError: Invalid argument, not a string or column: [79, -1, -1] of type <class 'list'>. For column literals, use 'lit', 'array', 'struct' or 'create_map' function.

我正在尝试根据优先级数组获取一个数字，并且它们的优先级随着它们从左到右而降低。

precedence = [1,2,11,12,13,20,20131,200,202,203,210,220,223,226,235,236,237,242,244,245,247,253,254,257,259,260,262,278,283,701,20107,20108,20109,20112,20115,20123,20135,20141,20144,20152,20162,20163,20167,20168,20169,20170,20171,20172,20173,20174,20175,14,211,213,258,270,273,274,275,277,280,281,287,288,20120,20122,20124,20125,20126,20130,20133,20136,20137,20138,20140,20142,20143,20154,20155,20156,20157]
reverse_order = precedence[::-1]

def get_p(row):
    if (row!=None) and (row!="null"):
        temp = row.split(",")
        test = []
        for i in temp:
            if (i.find('=')!=-1):
                i = i.split('=')[0]
            if int(i) in reverse_order:
                test.append(reverse_order.index(int(i)))
            else:
                test.append(-1)
        if max(test)!=-1:
            return reverse_order[max(test)]
        return -999
    else:
        return None

get_array = udf(get_p, IntegerType())

bronze_table = bronze_table.withColumn("precedence", get_array("event_list"))
bronze_table.select("event_list","precedence").show(100, False)

这里是示例记录，

+---------------------------------------------------------------------------------------+
|event_list                                                                             |
+---------------------------------------------------------------------------------------+
|276,100,101,202,176                                                                    |
|276,100,2,124,176                                                                      |
|246,100,101,257,115,116,121,123,124,125,135,138,145,146,153,167,168,170,171,173,189,191|
|246,100,101,278,123,124,135,170,189,191                                                |
|20131=16,20151,100,101,102,115,116,121,123,124,125,135,138,145,146,153,168,170,171     |
|null                                                                                   |
|20107=9,20151,100,101,102,123,124,135,170,189,191                                      |
|20108=3,20151,100,101,102,123,124,125,135,170,171,189,191                              |
|null                                                                                   |
+---------------------------------------------------------------------------------------+

我期待什么

+---------------------------------------------------------------------------------------+----------+
|event_list                                                                             |precedence|
+---------------------------------------------------------------------------------------+----------+
|276,100,101,202,176                                                                    |202       |
|276,100,2,124,176                                                                      |2         |
|246,100,101,257,115,116,121,123,124,125,135,138,145,146,153,167,168,170,171,173,189,191|257       |
|246,100,101,278,123,124,135,170,189,191                                                |278       |
|20131=16,20151,100,101,102,115,116,121,123,124,125,135,138,145,146,153,168,170,171     |20131     |
|null                                                                                   |null      |
|20107=9,20151,100,101,102,123,124,135,170,189,191                                      |20107     |
|20108=3,20151,100,101,102,123,124,125,135,170,171,189,191                              |20108     |
|null                                                                                   |null      |
+---------------------------------------------------------------------------------------+----------+

我的UDF 在 python 中按预期工作，但在 pyspark 中不工作。我请求有人帮我解决这个问题。

【问题讨论】：

【参考方案1】：

PySpark 数据帧中的null 在 Python 中是 None，所以这个条件 if row!="null": 是不正确的。请改用if row!= None:。

但是，您的 get_p 函数对我来说运行不佳，例如：

get_p('276,100,101,202,176')
# output: 2
# expected: 202

get_p('20131=16,20151,100,101,102,115,116,121,123,124,125,135,138,145,146,153,168,170,171')
# output: Exception `invalid literal for int() with base 10: ''`
# expected: 20131

【讨论】：

感谢您对此进行调查，我确实更新了 get_p 函数，@pltc【参考方案2】：

谢谢大家，谁试图解决这个问题。我可以使用这个链接解决它https://***.com/a/63654269/6187792

链接中指定的 max 函数的问题。

这是更新后的代码。

import builtins as p
def get_p(row):
    if (row!=None) and (row!="null"):
        temp = row.split(",")
        test = []
        for i in temp:
            if (i.find('=')!=-1):
                i = i.split('=')[0]
            if int(i) in reverse_order:
                test.append(reverse_order.index(int(i)))
            else:
                test.append(-1)
        if p.max(test)!=-1:
            return reverse_order[p.max(test)]
        return None
    else:
        return None

【讨论】：

我认为max 函数没有任何问题。问题是您导入 PySpark 函数的方式不正确。所以，不要使用import builtins as p，然后使用p.max 等内置函数，您应该使用from pyspark.sql import functions as F，然后使用F.udf 或F.col。

以上是关于TypeError: Invalid argument, not a string or column: [79, -1, -1] of type <class 'list'> colum的主要内容，如果未能解决你的问题，请参考以下文章