分组所有天python
Posted
技术标签:
【中文标题】分组所有天python【英文标题】:grouping all days python 【发布时间】:2017-06-24 10:21:58 【问题描述】:我有一个将其更改为时间序列的数据框。日期范围是从 2013 年到 2017 年。我想在一天中按小时对所有数据进行分组。 例如,所有星期一一起代表每小时,然后是所有星期二。最后我会有 168 (24*7) 行。 最好的方法是什么?
重采样后我有这个样本:
2017-01-17 00:00:00 NaN
2017-01-17 01:00:00 NaN
2017-01-17 02:00:00 NaN
2017-01-17 03:00:00 NaN
2017-01-17 04:00:00 1.0
2017-01-17 05:00:00 NaN
2017-01-17 06:00:00 NaN
2017-01-17 07:00:00 NaN
2017-01-17 08:00:00 NaN
2017-01-17 09:00:00 1.0
2017-01-17 10:00:00 3.0
2017-01-17 11:00:00 3.0
2017-01-17 12:00:00 3.0
2017-01-17 13:00:00 5.0
2017-01-17 14:00:00 2.0
2017-01-17 15:00:00 1.0
2017-01-17 16:00:00 2.0
2017-01-17 17:00:00 1.0
2017-01-17 18:00:00 1.0
2017-01-17 19:00:00 1.0
2017-01-17 20:00:00 NaN
2017-01-17 21:00:00 NaN
2017-01-17 22:00:00 NaN
2017-01-17 23:00:00 NaN
2017-01-24 10:00:00 14.0
2017-01-24 11:00:00 14.0
2017-01-24 12:00:00 5.0
2017-01-24 13:00:00 21.0
2017-01-24 14:00:00 14.0
2017-01-24 15:00:00 7.0
2017-01-24 16:00:00 9.0
2017-01-24 17:00:00 2.0
2017-01-24 18:00:00 1.0
2017-01-24 19:00:00 NaN
2017-01-24 20:00:00 NaN
2017-01-24 21:00:00 2.0
我想要类似的东西:
(count sum)
Monday: 00:00 xx
01:00 xx
...
23:00 xx
Tuesday: 00:00 xx
01:00 xx
...
23:00 xx
【问题讨论】:
您的描述有点太高了,无法真正提供具体建议。你能显示任何代码吗? 看看***.com/questions/16266019/… 【参考方案1】:我认为您可以通过 dayofweek
和 hour
groupby
聚合一些功能,例如sum
:
np.random.seed(100)
start = pd.to_datetime('2013-02-24 04:00:00')
rng = pd.date_range(start, periods=100, freq='3H')
#DataFrame has DatetimeIndex
df = pd.DataFrame('a': np.random.randint(10, size=100), index=rng)
print (df)
a
2013-02-24 04:00:00 8
2013-02-24 07:00:00 8
2013-02-24 10:00:00 3
2013-02-24 13:00:00 7
2013-02-24 16:00:00 7
2013-02-24 19:00:00 0
2013-02-24 22:00:00 4
2013-02-25 01:00:00 2
2013-02-25 04:00:00 5
...
...
print (df.index.weekday_name)
['Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Monday'
'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Tuesday'
'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday'
'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday'
'Wednesday' 'Wednesday' 'Thursday' 'Thursday' 'Thursday' 'Thursday'
'Thursday' 'Thursday' 'Thursday' 'Thursday' 'Friday' 'Friday' 'Friday'
'Friday' 'Friday' 'Friday' 'Friday' 'Friday' 'Saturday' 'Saturday'
'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Sunday'
'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Monday'
'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Tuesday'
'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday'
'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday'
'Wednesday' 'Wednesday' 'Thursday' 'Thursday' 'Thursday' 'Thursday'
'Thursday' 'Thursday' 'Thursday' 'Thursday' 'Friday' 'Friday' 'Friday'
'Friday' 'Friday']
print (df.index.hour)
[ 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4
7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7
10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10
13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13 16 19 22 1 4 7 10 13]
print (df.groupby([df.index.weekday_name, df.index.hour])['a'].sum())
Friday 1 13
4 10
7 6
10 13
13 11
16 2
19 0
22 8
Monday 1 6
4 12
7 8
10 5
13 11
...
...
如果DataFrame
有date
列:
np.random.seed(100)
start = pd.to_datetime('2013-02-24 04:00:00')
rng = pd.date_range(start, periods=100, freq='3H')
df = pd.DataFrame('date': rng, 'a': np.random.randint(10, size=100))
print (df)
a date
0 8 2013-02-24 04:00:00
1 8 2013-02-24 07:00:00
2 3 2013-02-24 10:00:00
3 7 2013-02-24 13:00:00
4 7 2013-02-24 16:00:00
5 0 2013-02-24 19:00:00
6 4 2013-02-24 22:00:00
7 2 2013-02-25 01:00:00
8 5 2013-02-25 04:00:00
print (df.groupby([df.date.dt.weekday_name, df.date.dt.hour])['a'].sum())
date date
Friday 1 13
4 10
7 6
10 13
13 11
16 2
19 0
22 8
Monday 1 6
4 12
7 8
10 5
13 11
如果Series
与DatetimeIndex
:
s = pd.Series(np.random.randint(10, size=100), index=rng)
print (s)
2013-02-24 04:00:00 8
2013-02-24 07:00:00 8
2013-02-24 10:00:00 3
2013-02-24 13:00:00 7
2013-02-24 16:00:00 7
2013-02-24 19:00:00 0
2013-02-24 22:00:00 4
2013-02-25 01:00:00 2
2013-02-25 04:00:00 5
2013-02-25 07:00:00 2
2013-02-25 10:00:00 2
2013-02-25 13:00:00 2
print (s.groupby([s.index.weekday_name, s.index.hour]).sum())
Friday 1 13
4 10
7 6
10 13
13 11
16 2
19 0
22 8
Monday 1 6
4 12
7 8
10 5
13 11
最后为DataFrame
添加reset_index()
:
df = s.groupby([s.index.weekday_name, s.index.hour]).sum().reset_index()
df.columns = ['days','hours','val']
print (df)
days hours val
0 Friday 1 13
1 Friday 4 10
2 Friday 7 6
3 Friday 10 13
4 Friday 13 11
5 Friday 16 2
6 Friday 19 0
7 Friday 22 8
8 Monday 1 6
9 Monday 4 12
10 Monday 7 8
11 Monday 10 5
12 Monday 13 11
通过评论编辑:
print (s)
2017-01-24 10:00:00 14.0
2017-01-24 11:00:00 14.0
2017-01-24 12:00:00 5.0
2017-01-24 13:00:00 21.0
2017-01-24 14:00:00 14.0
2017-01-24 15:00:00 7.0
2017-01-24 16:00:00 9.0
2017-01-24 17:00:00 2.0
2017-01-24 18:00:00 1.0
2017-01-24 19:00:00 NaN
2017-01-24 20:00:00 NaN
2017-01-24 21:00:00 2.0
Name: a, dtype: float64
df = s.groupby([s.index.weekday_name, s.index.hour]).sum().reset_index()
df.columns = ['days','hours','val']
print (df)
days hours val
0 Tuesday 10 14.0
1 Tuesday 11 14.0
2 Tuesday 12 5.0
3 Tuesday 13 21.0
4 Tuesday 14 14.0
5 Tuesday 15 7.0
6 Tuesday 16 9.0
7 Tuesday 17 2.0
8 Tuesday 18 1.0
9 Tuesday 19 NaN
10 Tuesday 20 NaN
11 Tuesday 21 2.0
【讨论】:
您的解决方案非常接近,如何添加有 nans 的小时数? 你能用这些 NaN 改变你的样本数据吗? @datascana - 我对其进行了测试并且它有效,请参阅编辑我的答案。还是你需要别的东西?以上是关于分组所有天python的主要内容,如果未能解决你的问题,请参考以下文章
Python学习第106天(Django的静态文件staticurl分组)
第195天学习打卡(项目 谷粒商城 37新增商品 获取分类下所有属性及分组)
Leetcode刷题100天—49. 字母异位词分组( 排序)—day37