Pandas-根据开关用数据框填充字典

Posted 2023-03-11

技术标签:

【中文标题】Pandas-根据开关用数据框填充字典【英文标题】：Pandas- Fill a dictionary with dataframes depending on a switch 【发布时间】：2022-01-02 06:35:44 【问题描述】：

背景：我有一些数据帧可以通过开关打开或关闭。我想用每个打开的数据框填充字典。然后我希望能够遍历数据框。

问题：我不知道如何动态构建我的字典以仅在打开开关时包含数据帧。

我的尝试：

import pandas as pd

sw_a = True
sw_b = False
sw_c = True

a = pd.DataFrame('IDs':[1234,5346,1234,8793,8793],
                   'Cost':[1.1,1.2,1.3,1.4,1.5],
                    'Names':['APPLE','Orange','STRAWBERRY','Grape','Blue']) if sw_a == True else []
b = pd.DataFrame('IDs':[1,2],
                   'Cost':[1.1,1.2],
                    'Names':['APPLE1','Blue1']) if sw_b == True else []
c = pd.DataFrame('IDs':[12],
                  'Cost':[1.5],
                    'Names':['APPLE2']) if sw_c == True else []
total = "first":a,"second":b,"third":c

for df in total:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

上述方法不起作用，因为它始终包含数据帧，如果开关关闭，它是一个字符串，而不是完全排除。

【问题讨论】：

【参考方案1】：

我的设置与你的类似，但我不关心每个数据帧分配上的开关：

import pandas as pd

sw_a = True

sw_b = False
sw_c = True

a = pd.DataFrame('IDs':[1234,5346,1234,8793,8793],
                   'Cost':[1.1,1.2,1.3,1.4,1.5],
                    'Names':['APPLE','Orange','STRAWBERRY','Grape','Blue'])
b = pd.DataFrame('IDs':[1,2],
                   'Cost':[1.1,1.2],
                    'Names':['APPLE1','Blue1'])
c = pd.DataFrame('IDs':[12],
                  'Cost':[1.5],
                    'Names':['APPLE2'])

total = "first":a,"second":b,"third":c # don't worry about the switches yet.

我们现在才过滤：

list_switches = [sw_a, sw_b, sw_c] # the switches! finally!
total_filtered = tup[1]:total[tup[1]] for tup in zip(list_switches, total) if tup[0]

照你做的继续。

for df in total_filtered:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

输出：

编辑您可以对zip 功能稍感兴趣，例如，如果您正在动态构建数据帧、数据帧名称和开关的列表，并且可以确保它们的长度始终相同，您可以执行以下操作：

# pretend these three lists are coming from somewhere else and can have variable length, rather than being hard-coded.
list_dfs = [a,b,c]
list_switches = [sw_a, sw_b, sw_c]
list_names = ["first", "second", "third"]

# use a zip object over the three lists.
zipped = zip(list_dfs, list_switches, list_names)
total = tup[2] : tup[0] for tup in zipped if tup[1]

for df in total:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

【讨论】：

这很好用，但我不确定这条线是否有效... total = tup[2] : tup[0] for tup in zipped if tup[1] @JonathanHay - 这是对 zip 对象的 dict 理解。你熟悉这些概念吗？感谢您的支持和接受，顺便说一句。【参考方案2】：

考虑这样的事情。

sw_a = True
sw_b = False
sw_c = True

a = pd.DataFrame('IDs':[1234,5346,1234,8793,8793],
                   'Cost':[1.1,1.2,1.3,1.4,1.5],
                    'Names':['APPLE','Orange','STRAWBERRY','Grape','Blue'])
b = pd.DataFrame('IDs':[1,2],
                   'Cost':[1.1,1.2],
                    'Names':['APPLE1','Blue1'])
c = pd.DataFrame('IDs':[12],
                  'Cost':[1.5],
                    'Names':['APPLE2'])

total = 
if sw_a == True:
    total['sw_a'] = a
if sw_b == True:
    total['sw_b'] = b
if sw_c == True:
    total['sw_c'] = c
print(total)

for df in total:
    temp_cost = sum(total[df]['Cost'])
    print(f'The number of fruits for df is len(total[df]) and the cost is temp_cost')

The number of fruits for sw_a is 5 and the cost is 6.5
The number of fruits for sw_c is 1 and the cost is 1.5

【讨论】：

以上是关于Pandas-根据开关用数据框填充字典的主要内容，如果未能解决你的问题，请参考以下文章