根据多个条件将新列添加到 Python Pandas DataFrame [重复]

Posted 2023-03-11

技术标签:

【中文标题】根据多个条件将新列添加到 Python Pandas DataFrame [重复]【英文标题】：Add new column to Python Pandas DataFrame based on multiple conditions [duplicate] 【发布时间】：2018-09-10 04:56:50 【问题描述】：

我有一个包含如下各列的数据集：

discount tax total subtotal productid 3.98 1.06 21.06 20 3232 3.98 1.06 21.06 20 3232 3.98 6 106 100 3498 3.98 6 106 100 3743 3.98 6 106 100 3350 3.98 6 106 100 3370 46.49 3.36 66.84 63 695

现在，我需要添加一个新列 Class，并根据以下条件为其分配 0 或 1 的值：

if:
    discount > 20%
    no tax
    total > 100
then the Class will 1
otherwise it should be 0

我已经在一个条件下完成了，但我不知道如何在多个条件下完成它。

这是我尝试过的方法：

df_full['Class'] = df_full['amount'].map(lambda x: 1 if x > 100 else 0)

我查看了所有其他类似的问题，但找不到任何解决我的问题的方法。我已经尝试了上述所有帖子，但遇到了这个错误：

TypeError: '>' not supported between instances of 'str' and 'int'

在第一次发布答案的情况下，我已经尝试过：

df_full['class'] = np.where( ( (df_full['discount'] > 20) & (df_full['tax'] == 0 ) & (df_full['total'] > 100) & df_full['productdiscount'] ) , 1, 0)

【问题讨论】：

请不要发布您的数据或代码的图片不提供数据图像并不意味着根本不提供样本数据。以文本格式提供示例数据。 【参考方案1】：

您可以使用 DataFrame.apply 在数据框行中应用任意函数。

在您的情况下，您可以定义如下函数：

def conditions(s):
    if (s['discount'] > 20) or (s['tax'] == 0) or (s['total'] > 100):
        return 1
    else:
        return 0

并使用它为您的数据添加一个新列：

df_full['Class'] = df_full.apply(conditions, axis=1)

【讨论】：

你只需return (s['discount'] > 20) or (s['tax'] == 0) or (s['total'] > 100) 嗨@Gustavo，上述条件方法返回TypeError: ("'>' not supported between instances of 'str' and 'int'", 'occurred at index 18') @AbdulRehman 只需使用 pd.to_numeric 或其他东西将您的列转换为数字 dtype。 @VMAtm 当然，这更短，但 Python 不是 C，所以这将使 Class 成为 bool dtype 列。由于尚不完全清楚 OP 的要求是什么，我尝试给出更广泛的答案，其中可以轻松添加/更改“类”的数量/类型，同时尽可能使用它们的符号。【参考方案2】：

从您的数据图像来看，您还不清楚discount 20% 是什么意思。

但是，您可能会做这样的事情。

df['class'] = 0 # add a class column with 0 as default value

# find all rows that fulfills your conditions and set class to 1
df.loc[(df['discount'] / df['total'] > .2) & # if discount is more than .2 of total 
       (df['tax'] == 0) & # if tax is 0
       (df['total'] > 100), # if total is > 100 
       'class'] = 1 # then set class to 1

注意& 在这里表示and，如果你想要or 而不是使用|。

【讨论】：

嗨@Karl，它返回这个错误：TypeError: '>' not supported between instances of 'str' and 'int'

以上是关于根据多个条件将新列添加到 Python Pandas DataFrame [重复]的主要内容，如果未能解决你的问题，请参考以下文章

检查panda数据帧中的多个列是否重合并在新列中标记它们

如何根据 Python 中的索引时间序列条件将新数据集附加到现有数据集

r - 使用 group_by 和 mutate 根据多个条件添加新列时出现意外的“=”

EntityFrameworkCore - 将新列添加到联接表

将新列添加到 wordpress 数据库

将新列添加到 Oracle 表后生成主键值