Pandas 按自定义频率分组并获取索引组

Posted

技术标签:

【中文标题】Pandas 按自定义频率分组并获取索引组【英文标题】:Pandas group by custom frequency and get groups of indexes 【发布时间】:2022-01-02 17:37:07 【问题描述】:

我有一个这样的熊猫时间序列auctions

problemStart                
2018-12-19 13:00:00        1
2018-12-19 14:00:00        0
2018-12-19 15:00:00        0
2018-12-19 16:00:00        0
2018-12-19 17:00:00        0
...                      ...
2021-10-29 12:00:00        0
2021-10-29 13:00:00        0
2021-10-29 14:00:00        0
2021-10-29 15:00:00        0
2021-10-29 16:00:00        1

[25084 rows x 1 columns]

按年份分组给出了预期的输出,这是一个字典,首先是组键,然后是组中数据帧的所有索引的列表。

auctions.groupby(auctions.index.year).groups
2018: [2018-12-19 13:00:00, 2018-12-19 14:00:00, 2018-12-19 15:00:00, 2018-12-19 16:00:00, 2018-12-19 17:00:00, 2018-12-19 18:00:00, 2018-12-19 19:00:00, 2018-12-19 20:00:00, 2018-12-19 21:00:00, 2018-12-19 22:00:00, 2018-12-19 23:00:00, 2018-12-20 00:00:00, 2018-12-20 01:00:00, 2018-12-20 02:00:00, 2018-12-20 03:00:00, 2018-12-20 04:00:00, 2018-12-20 05:00:00, 2018-12-20 06:00:00, 2018-12-20 07:00:00, 2018-12-20 08:00:00, 2018-12-20 09:00:00, 2018-12-20 10:00:00, 2018-12-20 11:00:00, 2018-12-20 12:00:00, 2018-12-20 13:00:00, 2018-12-20 14:00:00, 2018-12-20 15:00:00, 2018-12-20 16:00:00, 2018-12-20 17:00:00, 2018-12-20 18:00:00, 2018-12-20 19:00:00, 2018-12-20 20:00:00, 2018-12-20 21:00:00, 2018-12-20 22:00:00, 2018-12-20 23:00:00, 2018-12-21 00:00:00, 2018-12-21 01:00:00, 2018-12-21 02:00:00, 2018-12-21 03:00:00, 2018-12-21 04:00:00, 2018-12-21 05:00:00, 2018-12-21 06:00:00, 2018-12-21 07:00:00, 2018-12-21 08:00:00, 2018-12-21 09:00:00, 2018-12-21 10:00:00, 2018-12-21 11:00:00, 2018-12-21 12:00:00, 2018-12-21 13:00:00, 2018-12-21 14:00:00, 2018-12-21 15:00:00, 2018-12-21 16:00:00, 2018-12-21 17:00:00, 2018-12-21 18:00:00, 2018-12-21 19:00:00, 2018-12-21 20:00:00, 2018-12-21 21:00:00, 2018-12-21 22:00:00, 2018-12-21 23:00:00, 2018-12-22 00:00:00, 2018-12-22 01:00:00, 2018-12-22 02:00:00, 2018-12-22 03:00:00, ...

但是我不明白当我用pd.Grouper 做同样的事情时得到的输出:

auctions.groupby(pd.Grouper(freq="Y")).groups
Timestamp('2018-12-31 00:00:00', freq='A-DEC'): 299,
 Timestamp('2019-12-31 00:00:00', freq='A-DEC'): 9059,
 Timestamp('2020-12-31 00:00:00', freq='A-DEC'): 17843,
 Timestamp('2021-12-31 00:00:00', freq='A-DEC'): 25084

这里的字典项目是什么? 299是什么?

我想要做的是按半年间隔分组,我需要pd.Grouper,但我不明白它产生的输出,并希望它产生与简单地按index.year 分组相同的输出。


这里是auctions file: https://gist.github.com/charelF/96b5e6fb765be28377794ed27fd20ad6

【问题讨论】:

【参考方案1】:

显然有比groups 更好的方法来获取索引,即indices

auctions.groupby(pd.Grouper(freq='Y', key="problemStart")).indices

完全按照我想要的方式得到它:

k:auctions.index[v] for k,v in auctions.groupby(pd.Grouper(freq='Y')).indices.items()

返回

Timestamp('2018-12-31 00:00:00', freq='A-DEC'): DatetimeIndex(['2018-12-19 13:00:00', '2018-12-19 14:00:00', '2018-12-19 15:00:00', '2018-12-19 16:00:00', '2018-12-19 17:00:00', '2018-12-19 18:00:00', ...

因为这与期望的非常相似:

2018: [2018-12-19 13:00:00, 2018-12-19 14:00:00, 2018-12-19 15:00:00, 2018-12-19 16:00:00, 2018-12-19 17:00:00, 2018-12-19 18:00:00, 2018-12-19 19:00:00, 2018-12-19 20:00:00, 2018-12-19 21:00:00, 2018-12-19 22:00:00, 2018-12-19 23:00:00, 2018-12-20 00:00:00, 2018-12-20 01:00:00, 2018-12-20 02:00:00, 2018-12-20 03:00:00, 2018-12-20 04:00:00, 2018-12-20 05:00:00, 2018-12-20 06:00:00, 2018-12-20 07:00:00, 2018-12-20 08:00:00, 2018-12-20 09:00:00, 2018-12-20 10:00:00, 2018-12-20 11:00:00, 2018-12-20 12:00:00, 2018-12-20 13:00:00, 2018-12-20 14:00:00, 2018-12-20 15:00:00, 2018-12-20 16:00:00, 2018-12-20 17:00:00, 2018-12-20 18:00:00, 2018-12-20 19:00:00, 2018-12-20 20:00:00, 2018-12-20 21:00:00, 2018-12-20 22:00:00, 2018-12-20 23:00:00, 2018-12-21 00:00:00, 2018-12-21 01:00:00, 2018-12-21 02:00:00, 2018-12-21 03:00:00, 2018-12-21 04:00:00, 2018-12-21 05:00:00, 2018-12-21 06:00:00, 2018-12-21 07:00:00, 2018-12-21 08:00:00, 2018-12-21 09:00:00, 2018-12-21 10:00:00, 2018-12-21 11:00:00, 2018-12-21 12:00:00, 2018-12-21 13:00:00, 2018-12-21 14:00:00, 2018-12-21 15:00:00, 2018-12-21 16:00:00, 2018-12-21 17:00:00, 2018-12-21 18:00:00, 2018-12-21 19:00:00, 2018-12-21 20:00:00, 2018-12-21 21:00:00, 2018-12-21 22:00:00, 2018-12-21 23:00:00, 2018-12-22 00:00:00, 2018-12-22 01:00:00, 2018-12-22 02:00:00, 2018-12-22 03:00:00, ...

【讨论】:

以上是关于Pandas 按自定义频率分组并获取索引组的主要内容,如果未能解决你的问题,请参考以下文章

100天精通Python(数据分析篇)——第64天:Pandas分组groupby函数案例

Woocommerce 集团订单总额按自定义字段日期

按自定义时间间隔对对象数组进行分组

按自定义函数HIBERNATE的结果分组

按自定义模式排序列

Pandas 按值 1 对列进行分组并按频率排序