使用 Python，如何按小时对 Dataframe 中的列进行分组？

Posted 2023-03-11

技术标签:

【中文标题】使用 Python，如何按小时对 Dataframe 中的列进行分组？【英文标题】：using Python, How to group a column in Dataframe by the hour? 【发布时间】：2017-01-01 06:01:15 【问题描述】：

我有一个 python 数据框 (df1)，它有一个列时间。我使用pd.to_datetime(df1['time']) 将该列转换为日期时间序列。现在我得到一个这样的列：

2016-08-24 00:00:00  2016-08-13  00:00:00   
2016-08-24 00:00:00  2016-08-13  00:00:00     
2016-08-24 00:00:00  2016-08-13  00:00:00   
2016-08-24 00:00:00  2016-08-13  00:00:00  
2016-08-24 00:00:01  2016-08-13  00:00:01   
2016-08-24 00:00:01  2016-08-13  00:00:01   
2016-08-24 00:00:02  2016-08-13  00:00:02  
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02     
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02    
2016-08-24 00:00:02  2016-08-13  00:00:02     
....

2016-08-24 23:59:59  2016-08-13  00:00:02

基本上，我希望第一列按小时分组，这样我就可以看到 1 小时内有多少条目。任何帮助都会很棒。

【问题讨论】：

【参考方案1】：

使用@jezrael 设置。

df.resample(rule='H', how='count').rename(columns = 'time':'count')

                      count
2016-08-24 00:00:00      1
2016-08-24 01:00:00      3
2016-08-24 02:00:00      1

【讨论】：

是的，如果我将 groupby 用于单个列，则此方法有效。你知道当我们使用多列分组时会发生什么吗？【参考方案2】：

使用resample:

#pandas version 0.18.0 and higher
df = df.resample('H').size()

#pandas version below 0.18.0
#df = df.resample('H', 'size')

print (df)
2016-08-24 00:00:00    1
2016-08-24 01:00:00    3
2016-08-24 02:00:00    1
Freq: H, dtype: int64

如果需要输出为DataFrame:

df = df.resample('H').size().rename('count').to_frame()
print (df)
                     count
2016-08-24 00:00:00      1
2016-08-24 01:00:00      3
2016-08-24 02:00:00      1

或者您可以通过转换为 <M8[h] 然后聚合 size 来从 DatetimeIndex minutes 和 seconds 中删除：

import pandas as pd

df = pd.DataFrame('time': pd.Timestamp('2016-08-24 01:00:00'): pd.Timestamp('2016-08-13 00:00:00'), pd.Timestamp('2016-08-24 01:00:01'): pd.Timestamp('2016-08-13 00:00:01'), pd.Timestamp('2016-08-24 01:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 02:00:02'): pd.Timestamp('2016-08-13 00:00:02'), pd.Timestamp('2016-08-24 00:00:00'): pd.Timestamp('2016-08-13 00:00:00'))
print (df)
                                   time
2016-08-24 00:00:00 2016-08-13 00:00:00
2016-08-24 01:00:00 2016-08-13 00:00:00
2016-08-24 01:00:01 2016-08-13 00:00:01
2016-08-24 01:00:02 2016-08-13 00:00:02
2016-08-24 02:00:02 2016-08-13 00:00:02

df= df.groupby([df.index.values.astype('<M8[h]')]).size()
print (df)
2016-08-24 00:00:00    1
2016-08-24 01:00:00    3
2016-08-24 02:00:00    1
dtype: int64

【讨论】：

我的问题是我有多个列分组。我的代码目前是 df2 = df1['count'].groupby([df1['sc-status],df1[cs-method],df1[time]).count() 使用上面的代码，并使用我当前的数据，我得到了输入文件中的时间（每小时随机请求）。我正在努力进行下一步，即每小时对这个分组对象（df2）进行分组。希望这是有道理的【参考方案3】：

您可以使用pandas.DatetimeIndex，如下所示。

import numpy as np
import pandas as pd

# An example of time period
drange = pd.date_range('2016-08-01 00:00:00', '2016-09-01 00:00:00',
                       freq='10min')

N = len(drange)

# The number of columns without 'time' is three.
df = pd.DataFrame(np.random.rand(N, 3))
df['time'] = drange

time_col = pd.DatetimeIndex(df['time'])

gb = df.groupby([time_col.year,
                 time_col.month,
                 time_col.day,
                 time_col.hour])

for col_name, gr in gb:
    print(gr)  # If you want to see only the length, use print(len(gr))

[参考文献] Python Pandas: Group datetime column into hour and minute aggregations

【讨论】：

嗨@Daewon lee....谢谢你的回答。当我使用这段代码时，它会抛出一个错误，说 Series 对象没有小时值。有什么想法吗？ @Vijay 您使用哪个版本的 Python？上述代码已在 Windows 10 64 位的 Anaconda Python 3.5（64 位）中进行了测试。（你用的是哪个版本的 Pandas？我的是 0.18.1）

以上是关于使用 Python，如何按小时对 Dataframe 中的列进行分组？的主要内容，如果未能解决你的问题，请参考以下文章