Pandas-根据开关用数据框填充字典

Posted

技术标签:

【中文标题】Pandas-根据开关用数据框填充字典【英文标题】:Pandas- Fill a dictionary with dataframes depending on a switch 【发布时间】:2022-01-02 06:35:44 【问题描述】:

背景:我有一些数据帧可以通过开关打开或关闭。我想用每个打开的数据框填充字典。然后我希望能够遍历数据框。

问题:我不知道如何动态构建我的字典以仅在打开开关时包含数据帧。

我的尝试:

import pandas as pd

sw_a = True
sw_b = False
sw_c = True

a = pd.DataFrame('IDs':[1234,5346,1234,8793,8793],
                   'Cost':[1.1,1.2,1.3,1.4,1.5],
                    'Names':['APPLE','Orange','STRAWBERRY','Grape','Blue']) if sw_a == True else []
b = pd.DataFrame('IDs':[1,2],
                   'Cost':[1.1,1.2],
                    'Names':['APPLE1','Blue1']) if sw_b == True else []
c = pd.DataFrame('IDs':[12],
                  'Cost':[1.5],
                    'Names':['APPLE2']) if sw_c == True else []
total = "first":a,"second":b,"third":c

for df in total:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

上述方法不起作用,因为它始终包含数据帧,如果开关关闭,它是一个字符串,而不是完全排除。

【问题讨论】:

【参考方案1】:

我的设置与你的类似,但我不关心每个数据帧分配上的开关:

import pandas as pd

sw_a = True

sw_b = False
sw_c = True

a = pd.DataFrame('IDs':[1234,5346,1234,8793,8793],
                   'Cost':[1.1,1.2,1.3,1.4,1.5],
                    'Names':['APPLE','Orange','STRAWBERRY','Grape','Blue'])
b = pd.DataFrame('IDs':[1,2],
                   'Cost':[1.1,1.2],
                    'Names':['APPLE1','Blue1'])
c = pd.DataFrame('IDs':[12],
                  'Cost':[1.5],
                    'Names':['APPLE2'])

total = "first":a,"second":b,"third":c # don't worry about the switches yet.

我们现在才过滤:

list_switches = [sw_a, sw_b, sw_c] # the switches! finally!
total_filtered = tup[1]:total[tup[1]] for tup in zip(list_switches, total) if tup[0]

照你做的继续。

for df in total_filtered:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

输出:

编辑 您可以对zip 功能稍感兴趣,例如,如果您正在动态构建数据帧、数据帧名称和开关的列表,并且可以确保它们的长度始终相同,您可以执行以下操作:

# pretend these three lists are coming from somewhere else and can have variable length, rather than being hard-coded.
list_dfs = [a,b,c]
list_switches = [sw_a, sw_b, sw_c]
list_names = ["first", "second", "third"]

# use a zip object over the three lists.
zipped = zip(list_dfs, list_switches, list_names)
total = tup[2] : tup[0] for tup in zipped if tup[1]

for df in total:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

【讨论】:

这很好用,但我不确定这条线是否有效... total = tup[2] : tup[0] for tup in zipped if tup[1] @JonathanHay - 这是对 zip 对象的 dict 理解。你熟悉这些概念吗?感谢您的支持和接受,顺便说一句。【参考方案2】:

考虑这样的事情。

sw_a = True
sw_b = False
sw_c = True

a = pd.DataFrame('IDs':[1234,5346,1234,8793,8793],
                   'Cost':[1.1,1.2,1.3,1.4,1.5],
                    'Names':['APPLE','Orange','STRAWBERRY','Grape','Blue'])
b = pd.DataFrame('IDs':[1,2],
                   'Cost':[1.1,1.2],
                    'Names':['APPLE1','Blue1'])
c = pd.DataFrame('IDs':[12],
                  'Cost':[1.5],
                    'Names':['APPLE2'])

total = 
if sw_a == True:
    total['sw_a'] = a
if sw_b == True:
    total['sw_b'] = b
if sw_c == True:
    total['sw_c'] = c
print(total)

for df in total:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

The number of fruits for sw_a is 5 and the cost is 6.5
The number of fruits for sw_c is 1 and the cost is 1.5

【讨论】:

以上是关于Pandas-根据开关用数据框填充字典的主要内容,如果未能解决你的问题,请参考以下文章

从具有字典列的csv构造pandas数据框

如何使用 Pandas 数据框的常用键填充多个字典?

如何访问 pandas 数据框列中的字典元素并对其进行迭代以创建填充有各自值的新列?

根据特定条件和输入字典生成数据框 - pandas

如何根据 pandas 数据框中的数据类型填充 NaN 值?

Pandas 根据另一个数据框中的匹配列填充新的数据框列