分组所有天python

Posted

技术标签:

【中文标题】分组所有天python【英文标题】:grouping all days python 【发布时间】:2017-06-24 10:21:58 【问题描述】:

我有一个将其更改为时间序列的数据框。日期范围是从 2013 年到 2017 年。我想在一天中按小时对所有数据进行分组。 例如,所有星期一一起代表每小时,然后是所有星期二。最后我会有 168 (24*7) 行。 最好的方法是什么?

重采样后我有这个样本:

2017-01-17 00:00:00    NaN
2017-01-17 01:00:00    NaN
2017-01-17 02:00:00    NaN
2017-01-17 03:00:00    NaN
2017-01-17 04:00:00    1.0
2017-01-17 05:00:00    NaN
2017-01-17 06:00:00    NaN
2017-01-17 07:00:00    NaN
2017-01-17 08:00:00    NaN
2017-01-17 09:00:00    1.0
2017-01-17 10:00:00    3.0
2017-01-17 11:00:00    3.0
2017-01-17 12:00:00    3.0
2017-01-17 13:00:00    5.0
2017-01-17 14:00:00    2.0
2017-01-17 15:00:00    1.0
2017-01-17 16:00:00    2.0
2017-01-17 17:00:00    1.0
2017-01-17 18:00:00    1.0
2017-01-17 19:00:00    1.0
2017-01-17 20:00:00    NaN
2017-01-17 21:00:00    NaN
2017-01-17 22:00:00    NaN
2017-01-17 23:00:00    NaN        
2017-01-24 10:00:00    14.0
2017-01-24 11:00:00    14.0
2017-01-24 12:00:00    5.0
2017-01-24 13:00:00    21.0
2017-01-24 14:00:00    14.0
2017-01-24 15:00:00    7.0
2017-01-24 16:00:00    9.0
2017-01-24 17:00:00    2.0
2017-01-24 18:00:00    1.0
2017-01-24 19:00:00    NaN
2017-01-24 20:00:00    NaN
2017-01-24 21:00:00    2.0

我想要类似的东西:

                   (count sum)
Monday:    00:00     xx 
           01:00     xx
           ...
           23:00     xx
Tuesday:   00:00     xx
           01:00     xx
           ...
           23:00     xx           

【问题讨论】:

您的描述有点太高了,无法真正提供具体建议。你能显示任何代码吗? 看看***.com/questions/16266019/… 【参考方案1】:

我认为您可以通过 dayofweekhour groupby 聚合一些功能,例如sum:

np.random.seed(100)
start = pd.to_datetime('2013-02-24 04:00:00')
rng = pd.date_range(start, periods=100, freq='3H')

#DataFrame has DatetimeIndex
df = pd.DataFrame('a': np.random.randint(10, size=100), index=rng)  
print (df)
                     a
2013-02-24 04:00:00  8
2013-02-24 07:00:00  8
2013-02-24 10:00:00  3
2013-02-24 13:00:00  7
2013-02-24 16:00:00  7
2013-02-24 19:00:00  0
2013-02-24 22:00:00  4
2013-02-25 01:00:00  2
2013-02-25 04:00:00  5
...
...
print (df.index.weekday_name)
['Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Monday'
 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Tuesday'
 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday'
 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday'
 'Wednesday' 'Wednesday' 'Thursday' 'Thursday' 'Thursday' 'Thursday'
 'Thursday' 'Thursday' 'Thursday' 'Thursday' 'Friday' 'Friday' 'Friday'
 'Friday' 'Friday' 'Friday' 'Friday' 'Friday' 'Saturday' 'Saturday'
 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Saturday' 'Sunday'
 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Sunday' 'Monday'
 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Monday' 'Tuesday'
 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday' 'Tuesday'
 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday' 'Wednesday'
 'Wednesday' 'Wednesday' 'Thursday' 'Thursday' 'Thursday' 'Thursday'
 'Thursday' 'Thursday' 'Thursday' 'Thursday' 'Friday' 'Friday' 'Friday'
 'Friday' 'Friday']

print (df.index.hour)
[ 4  7 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4
  7 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7
 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7 10
 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7 10 13 16 19 22  1  4  7 10 13]
print (df.groupby([df.index.weekday_name, df.index.hour])['a'].sum())
Friday     1     13
           4     10
           7      6
           10    13
           13    11
           16     2
           19     0
           22     8
Monday     1      6
           4     12
           7      8
           10     5
           13    11
...
...

如果DataFramedate 列:

np.random.seed(100)
start = pd.to_datetime('2013-02-24 04:00:00')
rng = pd.date_range(start, periods=100, freq='3H')

df = pd.DataFrame('date': rng, 'a': np.random.randint(10, size=100))  
print (df)
    a                date
0   8 2013-02-24 04:00:00
1   8 2013-02-24 07:00:00
2   3 2013-02-24 10:00:00
3   7 2013-02-24 13:00:00
4   7 2013-02-24 16:00:00
5   0 2013-02-24 19:00:00
6   4 2013-02-24 22:00:00
7   2 2013-02-25 01:00:00
8   5 2013-02-25 04:00:00

print (df.groupby([df.date.dt.weekday_name, df.date.dt.hour])['a'].sum())
date       date
Friday     1       13
           4       10
           7        6
           10      13
           13      11
           16       2
           19       0
           22       8
Monday     1        6
           4       12
           7        8
           10       5
           13      11

如果SeriesDatetimeIndex

s = pd.Series(np.random.randint(10, size=100), index=rng)  
print (s)
2013-02-24 04:00:00    8
2013-02-24 07:00:00    8
2013-02-24 10:00:00    3
2013-02-24 13:00:00    7
2013-02-24 16:00:00    7
2013-02-24 19:00:00    0
2013-02-24 22:00:00    4
2013-02-25 01:00:00    2
2013-02-25 04:00:00    5
2013-02-25 07:00:00    2
2013-02-25 10:00:00    2
2013-02-25 13:00:00    2

print (s.groupby([s.index.weekday_name, s.index.hour]).sum())
Friday     1     13
           4     10
           7      6
           10    13
           13    11
           16     2
           19     0
           22     8
Monday     1      6
           4     12
           7      8
           10     5
           13    11

最后为DataFrame添加reset_index()

df = s.groupby([s.index.weekday_name, s.index.hour]).sum().reset_index()
df.columns = ['days','hours','val']
print (df)
         days  hours  val
0      Friday      1   13
1      Friday      4   10
2      Friday      7    6
3      Friday     10   13
4      Friday     13   11
5      Friday     16    2
6      Friday     19    0
7      Friday     22    8
8      Monday      1    6
9      Monday      4   12
10     Monday      7    8
11     Monday     10    5
12     Monday     13   11

通过评论编辑:

print (s)
2017-01-24 10:00:00    14.0
2017-01-24 11:00:00    14.0
2017-01-24 12:00:00     5.0
2017-01-24 13:00:00    21.0
2017-01-24 14:00:00    14.0
2017-01-24 15:00:00     7.0
2017-01-24 16:00:00     9.0
2017-01-24 17:00:00     2.0
2017-01-24 18:00:00     1.0
2017-01-24 19:00:00     NaN
2017-01-24 20:00:00     NaN
2017-01-24 21:00:00     2.0
Name: a, dtype: float64

df = s.groupby([s.index.weekday_name, s.index.hour]).sum().reset_index()
df.columns = ['days','hours','val']
print (df)
       days  hours   val
0   Tuesday     10  14.0
1   Tuesday     11  14.0
2   Tuesday     12   5.0
3   Tuesday     13  21.0
4   Tuesday     14  14.0
5   Tuesday     15   7.0
6   Tuesday     16   9.0
7   Tuesday     17   2.0
8   Tuesday     18   1.0
9   Tuesday     19   NaN
10  Tuesday     20   NaN
11  Tuesday     21   2.0

【讨论】:

您的解决方案非常接近,如何添加有 nans 的小时数? 你能用这些 NaN 改变你的样本数据吗? @datascana - 我对其进行了测试并且它有效,请参阅编辑我的答案。还是你需要别的东西?

以上是关于分组所有天python的主要内容,如果未能解决你的问题,请参考以下文章

Python学习第106天(Django的静态文件staticurl分组)

按天分组的 SQL 查询

第195天学习打卡(项目 谷粒商城 37新增商品 获取分类下所有属性及分组)

Leetcode刷题100天—49. 字母异位词分组( 排序)—day37

Leetcode刷题100天—49. 字母异位词分组( 排序)—day37

MySQL按天计数和分组