使用 Pandas 的日期时间每小时直方图 [重复]

Posted 2023-03-11

技术标签:

【中文标题】使用 Pandas 的日期时间每小时直方图 [重复]【英文标题】：A per-hour histogram of datetime using Pandas [duplicate] 【发布时间】：2016-04-21 06:08:18 【问题描述】：

假设我在pandas.DataFrame 中有一个时间戳列datetime。例如，时间戳以秒为单位。我想在 10 分钟内将事件分桶/分箱 [1] 桶/箱。我知道我可以将datetime 表示为整数时间戳，然后使用直方图。有没有更简单的方法？ pandas 内置的东西？

[1] 10 分钟只是一个例子。最终，我想使用不同的分辨率。

【问题讨论】：

这可能会让您接近：df.groupby(pd.TimeGrouper(freq='10Min')).mean().plot(kind="bar") 您可以将“bar”替换为“hist”，但我不确定这是否有意义。我猜 y 轴应该是频率，但 x 轴应该是什么？你有原始数据的例子和情节应该是什么样子的例子（即使它只是口头描述） 【参考方案1】：

要使用像“10Min”这样的自定义频率，您必须使用TimeGrouper——正如@johnchase 建议的那样——在index 上运行。

# Generating a sample of 10000 timestamps and selecting 500 to randomize them
df = pd.DataFrame(np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = 10000, freq='S'), 500),  columns=['date'])
# Setting the date as the index since the TimeGrouper works on Index, the date column is not dropped to be able to count
df.set_index('date', drop=False, inplace=True)
# Getting the histogram
df.groupby(pd.TimeGrouper(freq='10Min')).count().plot(kind='bar')

使用`to_period`

也可以使用to_period 方法，但它不起作用——据我所知——自定义时间段如“10 分钟”。此示例采用附加列来模拟项目的类别。

# The number of sample
nb_sample = 500
# Generating a sample and selecting a subset to randomize them
df = pd.DataFrame('date': np.random.choice(pd.date_range(start=pd.to_datetime('2015-01-14'),periods = nb_sample*30, freq='S'), nb_sample),
                  'type': np.random.choice(['foo','bar','xxx'],nb_sample))

# Grouping per hour and type
df = df.groupby([df['date'].dt.to_period('H'), 'type']).count().unstack()
# Droping unnecessary column level
df.columns = df.columns.droplevel()
df.plot(kind='bar')

【讨论】：

这让我更接近了。谢谢。我仍然有两个问题：1）x 轴刻度与数据的日期时间性质无关，2）“条形总和”不应该是 500 吗？不应该是@johnchase 建议的.plot(kind='bar') 而不是.hist() 吗？对不起，我在第一个答案中犯了一个大错误（太快不是解决方案）。我刚刚对其进行了编辑，并认为它现在可以解决您的问题。 sum 现在是 500 :-) 我实际上更喜欢dt.to_period 的解决方案。强制索引为时间戳是一个很大的限制。这是一个笔记本，里面有一些例子nbviewer.jupyter.org/gist/drorata/e58b673fd87edfc92960

以上是关于使用 Pandas 的日期时间每小时直方图 [重复]的主要内容，如果未能解决你的问题，请参考以下文章

使用 Pandas 的日期时间每小时直方图 [重复]

使用to_period

使用`to_period`