从具有特定索引和过滤的 pandas df 中获取值

Posted 2023-03-12

技术标签:

【中文标题】从具有特定索引和过滤的 pandas df 中获取值【英文标题】：Getting a value from a pandas df with specific indexing and filtering 【发布时间】：2019-06-05 12:52:08 【问题描述】：

我有以下数据框：

df
                                         eff   inv-cost  fix-cost  var-cost  inst-cap  cap-lo        cap-up  wacc  depreciation  annuity-factor
Site In Site Out Transmission Commodity                                                                                                        
Mid     North    hvac         Elec       0.9  1650000.0       0.0       0.0       0.0     0.0  1.500000e+15  0.07          40.0        0.075009
        South    hvac         Elec       0.9  1650000.0       0.0       0.0       0.0     0.0  1.500000e+15  0.07          40.0        0.075009
North   Mid      hvac         Elec       0.9  1650000.0       0.0       0.0       0.0     0.0  1.500000e+15  0.07          40.0        0.075009
        South    hvac         Elec       0.9  1650000.0       0.0       0.0       0.0     0.0  1.500000e+15  0.07          40.0        0.075009
South   Mid      hvac         Elec       0.9  1650000.0       0.0       0.0       0.0     0.0  1.500000e+15  0.07          40.0        0.075009
        North    hvac         Elec       0.9  1650000.0       0.0       0.0       0.0     0.0  1.500000e+15  0.07          40.0        0.075009

我想将Site In 和Site Out 中的值作为由元组组成的列表。下面是我想要的列表示例：

list = [('Mid','North'),
        ('South', 'Mid'),
        ('South', 'North')]

这里的关键是尽可能简单地使用pandas函数从Site In和Site Out获取值，并且因为从'Mid'到'South'的传输与从'South'到的传输是相等的'Mid'，列表中一些创建的元素应该被过滤掉。

到目前为止，我认为以下是我的想法，但也许你能找到更好的方法？

1) 获取Site In 和Site Out 的值并创建一个列表，该列表可能如下所示：

list = [('Mid','North'), ('Mid','South'),
        ('North', 'Mid'), ('North', 'South'),
        ('South', 'Mid'), ('South', 'North')]

2) 因为一半的元素是相等的并且没有必要，例如； ('Mid','North') & ('North', 'Mid')，其中一个可以删除。

3）最后我想要以下任何一个（顺序无关）：

list = [('Mid','North'), ('Mid','South'), ('North', 'South')]
list = [('North','Mid'), ('Mid','South'), ('North', 'South')]
list = [('South','Mid'), ('Mid','North'), ('North', 'South')]
etc...

df 的来源 传输表 https://github.com/rl-institut/urbs-oemof/blob/dev/mimo.xlsx

PS： 我不知道使用哪个 pandas 函数来获取第一项，也不知道如何弹出第二项中提到的元素。如果你也有更好的算法，我很乐意使用它。 TY

【问题讨论】：

如果您向我们提供您生成多索引 df 的方式，您的问题将更有可能得到回答，因为这不是我们自己复制最容易的事情。问题已编辑@d_kennetz 【参考方案1】：

我想我明白你在寻找什么：

对于第 1 步：只需获取索引值：

# reset the index so that 'Site In' and 'Site Out' are left
lis = list(df.reset_index(level=[2,3]).index.values)

[('Mid', 'North'),
 ('Mid', 'South'),
 ('North', 'Mid'),
 ('North', 'South'),
 ('South', 'Mid'),
 ('South', 'North')]

然后使用 set 和一些列表理解：

list(set(tuple(sorted(x)) for x in lis))

[('Mid', 'North'), ('Mid', 'South'), ('North', 'South')]

我假设你的多索引看起来像：

MultiIndex(levels=[['Mid', 'North', 'South'], ['Mid', 'North', 'South'], ['hvac'], ['Elec']],
           labels=[[0, 0, 1, 1, 2, 2], [1, 2, 0, 2, 0, 1], [0, 0, 0, 0, 0, 0], [0, 0, 0, 0, 0, 0]],
           names=['Site In', 'Site Out', 'Transmission', 'Commodity'])

【讨论】：

我明天去检查一下，应该可以了。那么我接受你的回答:) ty @Chris

以上是关于从具有特定索引和过滤的 pandas df 中获取值的主要内容，如果未能解决你的问题，请参考以下文章