Python Pandas groupby forloop & Idxmax

Posted 2023-03-11

技术标签:

【中文标题】Python Pandas groupby forloop & Idxmax【英文标题】： 【发布时间】：2013-09-23 15:48:16 【问题描述】：

我有一个必须按三个级别分组的 DataFrame，然后返回最高值。每天每个唯一值都有一个回报，我想找到最高回报和细节。

data.groupby(['Company','Product','Industry'])['ROI'].idxmax()

返回会显示：

Target   - Dish Soap - House       had a 5% ROI on 9/17
Best Buy - CDs       - Electronics had a 3% ROI on 9/3

是最高的。

以下是一些示例数据：

+----------+-----------+-------------+---------+-----+
| Industry | Product   | Industry    | Date    | ROI |
+----------+-----------+-------------+---------+-----+
| Target   | Dish Soap | House       | 9/17/13 | 5%  |
| Target   | Dish Soap | House       | 9/16/13 | 2%  |
| BestBuy  | CDs       | Electronics | 9/1/13  | 1%  |
| BestBuy  | CDs       | Electroincs | 9/3/13  | 3%  |
| ...

不确定这是 for 循环，还是使用 .ix。

【问题讨论】：

【参考方案1】：

我认为，如果我理解正确，您可以使用groupby 和idxmax() 收集系列中的索引值，然后使用loc 从df 中选择这些行：

idx =  data.groupby(['Company','Product','Industry'])['ROI'].idxmax()
data.loc[idx]

另一种选择是使用reindex:

data.reindex(idx)

在我碰巧有一个（不同的）数据帧上，reindex 可能是更快的选择：

In [39]: %timeit df.reindex(idx)
10000 loops, best of 3: 121 us per loop

In [40]: %timeit df.loc[idx]
10000 loops, best of 3: 147 us per loop

【讨论】：

如果 max（和朋友）在 groupby 和 df 中都接受了一个密钥，那就太酷了。不过，这可能仍然会更快...... 是的，我希望NumPy 也有一个key 参数用于max 和sort！（不过，就像你说的，它可能不包括在内，因为为 NumPy 数组的每个元素或 DataFrame 调用 Python 函数会严重阻碍速度。）我相信这应该是data.loc 而不是data.iloc。至少这对我有用。 @Sachin_ruk：非常感谢您的指正。实际上，它应该是 data.loc，因为 idxmax 返回标签，而不是索引位置。

以上是关于Python Pandas groupby forloop & Idxmax的主要内容，如果未能解决你的问题，请参考以下文章

Python、Pandas：GroupBy 属性文档

python pandas groupby分组后的数据怎么用

numpy 或 pandas groupby 方式替换 2 个 for 循环

python [groupby]示例groupby #pandas #secret

[Python Cookbook] Pandas Groupby

如果在 groupby 中出现条件，则 Python (Pandas)